ROS 2 and Large Data transfer on lossy networks

Using ROS 2 to transfer large data over unreliable networks (e.g., Wi-Fi) can be a complex task that can lead to communication issues and packet loss and require a complex DDS configuration.

At eProsima, we have worked hard to simplify configuration and enhance performance to address this challenge.

As Fast DDS is the ROS 2 default middleware communication, our Fast DDS tech team has the solution to large data transfer on lossy networks:

  • Auto-configure built-in transports, simplifying the process and enhancing reliability.
  • TCP transport layer enhanced to add reliability and performance.

Moreover, these enhancements are easily accessible.

Users can select:

  • Modify an environment variable
  • Modify via XML
  • Modify through C + + API

How is it done?

Fast DDS discovers all nodes via UDP and later changes the data transfer to TCP, leveraging TCP’s benefits of transferring large data (TCP flow control) and avoiding manual configuration complexities.

Now, transferring large data (images, video, point clouds) is an easy task in all data links (Wi-Fi, Radio, mobile link…) in both Indoor and outdoor scenarios.

This enhancement is particularly valuable in ROS 2 environments, where utilizing Lidar, Depth cameras, and Wi-Fi networks is typical.

The new feature is already available in Vulcanexus and will soon be available in the general ROS 2 distributions. Download our dockers or install them from our Debian repos.

Want to know all the details? → 1.2. Large data communication with ROS 2 (a practical example) — Vulcanexus 1.0.0 documentation

12 Likes

Wow, this looks good!

It’s interesting to see how a lot of the core decisions were made correct in ROS 1 and ROS 2 slowly circles back to them :smiley: Kudos, Willow Garage :wink:

I have a few questions:

  1. How will the transport behave in case the network is congested? Will it start throwing away single TCP packets, or will it be more similar to the ROS 1 behavior (discarding whole messages)?

  2. Is it really needed to also change the default transport on the receiving end? I.e., when the robot is configure for large data transport, will the notebooks automatically use it, or do both ends need to have this enabled?

Hi @peci1, we are glad you like the feature!

I wouldn’t say this behavior is that of ROS 1 though, it sure has similarities, but also some key diferences. As you know, in ROS 1 the discovery is orchestrated by the ROS master, while when using LARGE_DATA, the discovery is still the automatic and decentralized ROS 2 discovery. In fact, that’s one of the key points, you leverage the benefits of the TCP transport in terms of reliability and builtin flow control without the need of knowing where your TCP connection server actually is.

Furthermore, I’d like to underline that this feature is not the default behavior for various reasons:

  1. Interoperability with other DDS based RMWs. Up to this point this has been a very important capability of ROS 2.
  2. Sacrifice of the realtimeness of UDP.

From our experience, there are tons of kinds of robotics (and non-robotics) applications out there, and for some of them, using UDP is not only sufficient but also preferred. While I do understand the need of a more reliable solution when transmitting large data samples over lossy networks, which has long been a challenge in ROS 2 now overcome thanks to this new feature, I cannot say whether that is a requirement for the majority of the projects out there, no matter how verbal they are.

In fact, I’d say that the community has also suffer from somewhat unreal expectations with regards to transmitting for instance video feeds, where a lot of projects want to transmit raw 4K frames over WiFi and receive them all. While this feature is actually targeting those large sample use cases specifically, that is not how high quality video streaming services operate. Instead they use video encoding/decoding, they normally have dedicated bandwidth on the ISPs, and for the most part, they use UDP.

What I mean to say with all this is that IMO, the ROS 2 approach is more flexible in this sense, since each application can pick and choose which pieces fit best. At the end of the day, we just want to provide different options to be able to cover as much use cases as possible.

This actually depends. What we can say for sure is that it will start discarding full RTPS packets. Whether that represents whole messages of just fragments would depend on the configured fragmentation limit, which by default is set to ~65 kB (mirroring that of the default UDP transport). On the next Vulcanexus sync, this limit and some other parameters can be configured via the FASTDDS_BUILTIN_TRANSPORTS env var with a URI query-like syntax. For example, you can configure your socket to have a size 1 MB, and to start fragmenting data when the sample size exceeds 200 KB (full docs about it here), stay tuned:

export FASTDDS_BUILTIN_TRANSPORTS=LARGE_DATA?max_msg_size=200KB&sockets_size=1MB

Moreover, Fast DDS also offers other solutions for tackling or at least detecting congestion issues, varying from throughput control to monitoring the DDS performance by leveraging the Fast DDS statistics module (see Statistics Module & Fast DDS Monitor). Using the statistics module, more advanced applications could monitor the state of the network with respect to the Fast DDS data delivery in up to 18 different metrics and take decisions based on that (such as lowering video quality for instance).

At the moment yes, it is. Mind that out of the box, Fast DDS participants (ROS 2 contexts) instantiate two transports (one UDPv4 and one SHM). When setting the LARGE_DATA mode, two things happen:

  1. A third transport (TCPv4 in this case) is added
  2. UDP transport usage is restricted to sending and receiving multicast participant announcements.

This means that a receiver with the default transports will not have a TCP transport and therefore will not be able to receive over TCP (it could receive over SHM in case they are running on the same host though).

7 Likes

@EduPonz @LMoreno great feature, thanks for sharing this.

i think it would be nice to add this to GitHub - ros2/rmw_fastrtps: Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.. i understand we do not want to add all Fast-DDS configuration to rmw_fastrtps, but this one would be really helpful for ROS 2 application.

thanks,
Tomoya

4 Likes

Ready to see large data in action?
We have recorded a practical example of Large data communication with ROS 2.
Watch as the ROSbot XL, equipped with an RPLIDAR A2 laser scanner and an Orbbec Astra RGB-D camera, navigates outdoors and indoors and maps its surroundings effortlessly.
Watch the video!

2 Likes

@LMoreno @EduPonz

i think it would be nice to add this practical setting in the rmw_fastrtps doc.
that would be appreciated if you can take a look!

thanks,
Tomoya

1 Like

This seems great!
Sending images or other large data with the default ROS 2 configuration is very painful!
I’m looking forward to try it.

I think we should reconsider whether this is really needed (and by who).

As it is today, the amount of compability between different RMWs is low.
For example ROS 2 actions don’t work between cyclonedds and fast-dds.
We also have non-DDS RMWs (the plan is to get rmw_zenoh as tier 1), and I assume that interoperability is not expected with them.

And services aren’t always portable across vendors either.