Investigation into alternative middleware solutions

As an OpenCyphal (formerly UAVCAN) maintainer, I am delighted to see that it has already been mentioned in this thread. The topic of a Cyphal-based RMW resurfaces on our forum every now and then.

The critical features of Cyphal that one might want to focus on when comparing it against DDS is the very lean, bare-bones design with zero code bloat, focusing on functional correctness, verifiability, and applicability to hard real-time systems. We have a basic introduction on the front page if this sounds interesting: https://opencyphal.org.

Considering the commitment of OpenCyphal to simple designs, I imagine that building a custom RMW based on it is not a major hurdle, and it would serve as a good alternative to the complexity of DDS for relevant applications. Would the community like to explore this further with us?

3 Likes

The big question is whether mqtt can handle all the possible data loads that can be expected in a ROS system. Can you e.g. imagine both streaming a 4k 30fps Mjpeg stream and a 400 Hz IMU over mqtt?

2 Likes

No, not directly. MQTT has message size limits.

We use it for cloud to robot communication for Mission Dispatch to give tasks to ROS 2 robots; We have to sideband larger data including map updates, event recordings, and teleop as they are too large to go over MQTT.

2 Likes

Then I can’t imagine mqtt becoming a rmw.

Can OpenCyphal transmit large messages over lossy links like wifi? On the front page ethernet support is udp only, and listed as work-in-progress. How would retransmission of lost packets work?

Cyphal is primarily designed for intra-vehicular/intra-robot networks where the packet loss model is different compared to a typical WiFi network. There is no built-in confirmation/retry mechanism in Cyphal, but there is a tunable, fairly simple forward error correction support allowing one to choose the trade-off between the bandwidth overhead and the loss probability. Would it be interesting to evaluate a demo?

There is Cyphal/serial that supports TCP as well on the roadmap, but it is not a priority at this stage.

1 Like

I think there is little use in everyone voicing their preferred communication library for an RMW implementation.

After all:

  1. That’s what the survey is for, and
  2. Ideally nobody really cares about the RMW, as long as it just works for your use case.

Imo. the use cases should be the primary focus, as these define the requirements, which should gathered into formal project specifications.

The eventual choice of library A, B or C (or rather keeping focus on DDS) should be determined only by how well the specifications can be realized with that choice.

So my suggestion is to post desired functionality rather than yet another suggestion for NextGreatLibraryX.


Of course: no such thing as a free lunch, so let me put this into practice by summarizing the desired functionalities mentioned so far:

  • @clalancette:

    • “work in all environments” (multicast UDP might be disabled or other network restrictions)
    • “make something that works with less configuration”
  • @Bernd_Pfrommer

    • “ease-of-use of the good old ROS1 transport”
    • “every now and then they [stop working]”
    • “asynchronous publishing. Supposedly fastrtps can do it, but configuring it is very complicated”
    • “simple setup and predictable behavior”
    • “transmit large messages over lossy links like wifi”
  • @cosmic

    • “huge burden of maintenance and security” (i.e. take into account limited developer resources)
  • @thejeeb

    • “easier to use for the vast majority of use cases”
    • “scaling to larger systems”
    • “inertia of adding additional DDS features”
  • @Jaime_Martin_Losa

    • “better docs and tutorials”
    • “scaling to larger systems”
  • @jpace121

    • “at the expense of adding in a central broker.”
    • “long term maintenance”, “no signs it will go away any time soon”
  • @pavel-kirienko

    • “very lean, bare-bones design with zero code bloat”
    • “functional correctness”
    • “verifiability”
    • “applicability to hard real-time systems”
  • @peci1

    • “handle all the possible data loads that can be expected in a ROS system”
      • “E.g. both streaming a 4k 30fps Mjpeg stream and a 400 Hz IMU”
  • My own wishlist:

    • Cross platform availability,
    • Cross distribution compatibility (e.g. Galactic to Iron etc),
    • Efficient inter-process communication,
    • Highly documented (don’t waste this opportunity!),
    • Supporting security features
    • My focus is on industrial manipulation tasks:
      • Small-scale settings (e.g. a pick and place task with 1 to 5 pickers and 1 to 5 perception systems such as camera or point cloud sensor),
      • In a relatively fixed configuration (i.e. no roaming robots that unexpectedly can come into the same network segment as other robots, etc).

6 Likes

Agreed with @JRTG.

Adding another item: Not communicating with other robots on the same network and flooding the routers with tons of traffic by default - with a documented simple flag to allow outside network communication.

3 Likes

Another suggestion:

Provide a means for time synchronisation between distributed systems

  • Not sure if this should be part of the RMW,
  • But as it is closely related to the communication between the sytems, maybe it isn’t illogical for this to be part of RMW?
1 Like

This doesn’t make sense. There are NTP, PTP and other well established protocols. Maybe RMW could somehow check if time is reasonably synced, but definitely it should not act as the sync agent.

3 Likes

@peci1

Literally everyone setting up a distributed ROS 2 system has a need for clock synchronisation, even being it manually setting the system clocks. So clock synchronisation is clearly a requirement for a distributed ROS 2 system.

From an end-user point of view, having to figure out how to configure NTP, PTP or whatever on your system, as well as on other possible systems that you want to integrate (and for which you might not even have adminstrator rights), is clearly less straightforward than just firing up nodes and have them auto-sync their internal clocks.

So I strongly disagree that it “doesn’t make sense”.
To me it makes very much sense.

Whether or not it could be a (current) development priority is of course another topic.
But in my experience it is best practice to gather requirements first, and only then formally decide on which ones to keep as project specifications.


I have no microROS experience other than having had a superfluous look at it, so I might be wrong, but based on this I conclude that the microROS RMW does provide support for time synchronisation.
Again, to me this makes sense as it’s the communication layer.

2 Likes

As DDS middlewares fill the niche of infinitely tunable nobs for those with that need or desire, I would prefer rmw_new to be more like ROS1, which works out of the box with no configuration for most people.

For this, it needs to be kind to networks (no multicast dos attack), choose reliable communication over throughput most of the time, and startup/teardown quickly and reliably without leaving your system in a weird state. Requiring a central node (like the ros1 master) would be good if it made the system more robust and easier to use in my opinion.

8 Likes

@JRTG @peci1 thanks for sharing your thought.

about time sync, here is what i think.

So clock synchronisation is clearly a requirement for a distributed ROS 2 system.

i would say not always, but probable for ROS 2 application.

From an end-user point of view, having to figure out how to configure NTP, PTP or whatever on your system, as well as on other possible systems that you want to integrate (and for which you might not even have adminstrator rights), is clearly less straightforward than just firing up nodes and have them auto-sync their internal clocks.

i would take the opposite path as one of the end-user.

NTP, PTP setting should be totally agnostic from ROS 2 application framework but general system layer.

So that we can apply NTP, PTP setting for any other application framework.

Time synchronization is one of the requirement, but not always with ROS 2.

if i find the difficulties how to set NTP or PTP, i would develop some tools for NTP, PTP (or contribute to them) but ROS 2.

thanks,

1 Like

Clock synchronization can be done either at the application level if needed, or through external mechanisms (e.g. ntp). Doing this within the rmw will bloat the rmw design and could lead to confusing interactions between rmw and external synchronization mechanisms fighting each other. This should never be done by the rmw

I was thinking rather along the lines of the (long term) vision where you could:

  • Buy ROS-enabled robot from supplier X,
  • Buy ROS-enabled conveyor from supplier Y, and
  • Buy ROS-enabled laser line scanner from supplier Z

and combine them to build a picking station.

In that case, supplier X, Y and Z need to provide you with some compatible means to synchronize the systems. This could be NTP or whatever, as long as X, Y and Z agree on a standard.
In my ideal world you’d just don’t need to care about that.

Now, from the rather pronounced replies I conclude that synchronisation is not at all considered as an issue in need of a solution.

Which is fine, it was a mere suggestion.

Time sync is related to simulation so, excuse my ignorance of ROS, is the RMW also “simulation aware”?

Also, hello ROS community. I’m new here. I’m one of the maintainers of OpenCyphal along with Pavel Kirienko, and an engineer at Amazon Prime Air. My ears perked up when someone forwarded me this thread as I’ve been interested in porting OpenCyphal to ROS for a couple of years now and I’m excited to see development of the RMW layer is a priority for the next ROS2 release!

Amazon’s work with OpenCyphal stems from an interest in communication protocols that can evolve; supporting rapid prototyping at first, integrating with simulations as the project develops, and moving into the final product as the system goes to production. The middleware, in this case, is where the ICDs for a robot system are implemented so it enables greater agility if the interface technology can follow the system components through the product lifecycle ensuring continuous inter-operability even if individual components are maturing at different rates from the overall system.

As for the RMW requirements: you’re probably looking to support several/many different middleware solutions? If so then I’d recommend selecting two or more different technologies, other than DDS, to integrate with as you prove out the independence of the RMW layer. Working with multiple protocol teams, the ROS maintainers would be able to leverage their familiarity with DDS and the two teams’ familiarity with their own solution to negotiate conflicts and ensure proper generalization of the design. Of course I’d suggest OpenCyphal be one of these partners but regardless the targets chosen should emphasize different use cases that are typically not well-supported by DDS. Cyphal’s emphasis is on single system intra-communications and support for deeply embedded targets. It takes significant simplifications from the assumption that there is one “robot” made out of many components and leaves communicating with external systems to other protocols like MQTT, or even DDS. Perhaps a counterpoint then would be something that emphasizes distributed communication but in more modern modes (e.g. mesh networking or cloud-connected?)?

3 Likes

Heh, my response was delayed waiting for moderator approval so it got a bit behind the discussion.

Time synchronization as part of the middleware does have merit in my view. Not only does precision time rely on the hardware under the transport in use (which is, of course, underneath the middleware in use) but you also want to have the same portability for simulated time in a federated simulation that you have for abstract communications between components. Again, I’m new to ROS so I might be completely misunderstanding how it handles time and simulation. Sorry if this is noise.

2 Likes

The time syncing conversation is interesting but it doesn’t feel like something to me that should be related to a specific rmw implementation. Specifically, whether I’m using Cyclone or FastDDS or this new option, you want it all the same.

Wouldn’t most of these problems go away with a RMW on top of DDS with extensions for TSN?
To my understanding, the IEEE TSN set of standards will solve most of the reliability and deterministic issues encountered in the past.

Still might be a bit early, since the OMG relased the specs only last april23), but late enough to do some initial prototyping.
https://www.omg.org/news/releases/pr2023/04-17-23.htm

At the industrial “IEC61131/PLCopen/OPC UA”, PLC based robot programming community, that uses mostly IEC fieldbuses Interfaces (profinet, Ethercat, Ethernet-IP, etc), there is the idea to prototype a robot manipulation message with OPC UA PubSub over TSN, based on this open source SDK: https://www.open62541.org/

This could be a promising way to finally have on single standard/interface for path waypoint/target position streaming, between controller/edge and robot controller, enable a bit of real time control cloud robotics as well.

Also down the road (>3 years) provide an opportunity for a converged message betwwen ROS and the IEC, eliminating the need for all those drivers /translators from ROS to fieldbuses.

1 Like

Thanks for all the work your team is doing. I do want to make sure that any new features that are added are supported by all the RMW vendors so we are not locked into one vendor. I assume the vulcanexus tools are specific to FastDDS as the rmw.