ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A answers.ros.org

ROS2 Default Behavior (Wifi)

If you recommend using a discovery server why is that unable to work out of the box? I think you’re missing or ignoring the point. I’m asking a thematic question about whether we as a ROS community value good non-theoretically-ideal wifi out-of-the-box support even if it degrades slightly the performance in other aspects.

That sentiment is the problem I’m trying to solve. From a roboticists perspective, I find this an unacceptable solution. You cannot ask a new user to configure or optimize their network topology to get a simple ROS2 demo working with their robot. If I buy a Fetch robot from Fetch Robotics, stick it in my cubicle at work, build ROS2 application and can’t get an HD image stream and a couple of pointclouds from the robot to my computer in rviz, that’s a critically impaired process. I find it acceptable at that point where you are deploying it in some massive industrialized setting to need to optimize performance. What you suggest there defines the following work flow for a “average user” I define above with the skill sets I describe above:

  • Learn ROS2 API
  • Get some robot to play with
  • Build their custom application
  • Run it and fail
  • Immediatelly hit the wall with ROS and then have to spend a week or more learning enough about DDS to know what dials to turn or learn about networking to a point to have a nuanced understanding of multicast, routers, unicast, discovery servers, etc.
  • Or they give up and spin their own UDP socket or regress back to ROS1, which is exactly what I would do in that situation.

Working out of the box in corporate wifi in steady state setting reasonably well is the minimum viable product of ROS2 that needs to be offered out of the box, in my opinion. This is what I’m referring to in the original post about treating wifi as a special case. This capability should be baked in on install, and if the solution a specific vendor implements doesn’t allow for that, perhaps the requirements that derived that solution should be reconsidered.

I’m describing a situation with 1 robot, with 50 nodes on 3 computers, which is a hello world case in robotics, perhaps not in networking, but we’re not a networking community, we’re a robotics community. Navigation uses 19 nodes by itself. Add sensor drivers, hardware drivers, user-space application, you hit 40-50 very quickly. There’s going to be a bunch of data flying around, HD images, 3D pointclouds are not exceptions to our needs, its the definition of them.

I really didn’t want to make this conversation pointed towards one vendor or another, but this is simply not true. That may be for your idealized network case with a few nodes with a router sitting right next to it with no other traffic but it is not the experience shared by multiple groups within Samsung, Rover, multiple groups in/around Intel, and myself. If we want to get granular, it seems whatever Cyclone’s SPDP option seems to fix this to a reasonable degree without any additional configuration / hardcoding of addresses. For them, my original posts’ intent would just be having that enabled by default or providing that in a default configuration file with ROS2 installs.

My goal still isn’t to talk about any particular vendor or any particular vendor’s solution to this problem, but in getting overall agreement that “yes, having good wifi support out of the box is something we find very important” or “no, having good wifi support isn’t worth it, we should instead provide accessible documentation”.

I’d be interested to hear @joespeed @tfoote @rotu jump in on their thoughts. But really, anyone that has an opinion about if they find this to be a problem for their needs or not to get some more outside input.

1 Like

Hi @smac,

In Fast RTPS, you use multicast just for participant discovery (so “SPDP option” is not required, EDP is going to be always using unicast), and we have a very defensive defaults for this kind of networks. After Intel and others reported some problems over wifi on navigation, we reacted modifying the discovery behaviour, and changing the default parameters, some months ago, so that is already on place.

With those settings (now the default), our performance is really good. See this video from @rotu:

This is the case you are talking about, I think.

When you need to improve even more this behaviour, because multicast is not an option in your network, then you should use the discovery server or unicast discovery.

The Discovery server is included in Fast RTPS, any node can be a discovery server, you do not need anything more. And we can find easy configuration options. Right now, it is as simple as this:

https://fast-rtps.docs.eprosima.com/en/latest/use-cases.html#udpv4-example-setup

But we can even go further. I have no particular problem in making this discovery mechanism the default.

1 Like

Awesome, if other users in this thread come back and we have some agreement that better wifi support is something we want out of the box, lets make that the default configuration for your DDS then. In the mean-time, lets stop on the specifics of your product and let users and others express their thoughts on the topic I actually addressed in my post.

This is not an affront on Fast RTPS or advocating for Cyclone or really caring much at all about the implementation details under the hood of ROS2 RMW. I’m looking at evidence-based conclusions from external users trying to deploy mobile robotics applications using ROS2 and DDS and having issues that contacted me in trying to get navigation stack stuff working. Clearly, there is some issue or disconnect. I don’t know what it is and for the purposes of this discussion, its implementation details. The question at hand is more thematic that are we willing to make trade offs to support this and if this is as big of a problem as I feel it is.

I agree quite strongly with this. While any engineer knows that when you move to production you will need to spend time optimising and making things reliable, in a typical office using the office wifi is where prototyping most often begins, and that is when many foundational technology decisions are made, including “Should I use ROS 1, ROS 2, or my own solution?”.

2 Likes

It seems you already have some ideas – or at least have talked to some vendors about this @smac

Why don’t we make this more actionable and just list the options/approaches/settings that have worked/should work for the types of networks you have encountered difficulties with?

I don’t believe anyone here is going to say: “no, I don’t want wifi to work, I’d rather spent 3 weeks reading DDS documentation”.

So let’s get to it and start working on solutions.

So far, I’ve seen mentioned:

  • SPDP for Cyclone
  • Discovery server / unicast discovery for FastRTPS

Without listing these, there doesn’t seem to be a way to discuss whether they should be “the default”, as we’d be drawing a conclusion not based on technical facts, but based on a desire. And without knowing which options there are, we cannot determine whether they would also work sufficiently for non-wifi setups (as I would not know what to turn on/off).

There is a reason these are not the defaults in many DDS implementations, and it’s likely it is a good one. Whether it is a good one in the scenarios you’ve described remains to be seen.

I’m not the expert in this so I will abstain. I just know its a problem that needs to be solved from the experiences of many users trying to use out-of-the-box ROS2 in typical office/robotics environments. I want to rather build consensus that this is a problem to build a requirement around that we can then deliver to find a technical solution. I don’t want to find a technical solution looking for a problem. I think that each vendor will have their own solution, but partially the reason some work better than others is because they designed their technical solutions in the absence of actual feedback from robotics users. I think this is generally better practice. XYZ dials existing doesn’t mean we don’t actually need different dials.

I wouldn’t necessarily assume that. It may be true that they’re not default from the generic DDS implementation, but I don’t think there’s a particular reason those are the defaults for the ROS2 use-case of them.

Really, what I want to hear from are users of ROS2 if this is a problem they have. I want to build a general consensus before anything further. To me, this problem makes ROS2 DOA for mobile robotics users, which as @Katherine_Scott’s post shows is 50% of total ROS users. If I showed up off the street with the problems that exist from a clean ROS2 install, I’d immediately disregard ROS2 as an academic toy and move on.

1 Like

Hi @gavanderhoorn,

Just to clarify, “SPDP” is already the default behavior for Fast RTPS.

1 Like

@smac I 100% agree that the focus should be on good behavior by default, as well as anything else that reduces the barrier to entry.

@Jaime_Martin_Losa Thanks for sharing that bringup video. I learned a ton from making it, including performance implications of Wifi Muliticast (very bad, not just for the app that’s doing the multicasting but for my poor coworkers trying to use the network normally!) Also, I’m really glad that ADLINK and eProsima were able to use the findings to improve their product across the board!

That Pi setup is beautiful! Where can I find the test setup that you use on it?

1 Like

Putting my 2 cents in this:

Yes, I , as a user, really believe this is a big problem.
I experienced multiple occasions where wifi was the biggest show stopper for ROS2.

  • University: should we switch teaching from ROS1 to ROS2? ROS1 is a big hassle regarding ROS_Master to setup for like 20 students that all want to control their own turtlesim on a projection in class. The wifi network can withstand that mostly to some point but than just crashes totally. This is more due to the local wifi setup and nothing regarding ROS.
    After doing some research for ROS2 and coming across various posts on wifi issues for even 1 robot the decision to stay on ROS1 was immediately made. During such courses students mostly get a ready-made platform with some sensors to play with and then do all the programming on their own laptops. The concept of remote login via ssh is mostly unknown to them. Therefor a stable wifi connection out of the box is strongly required.

  • ROS2 training from Frauenhofer IPA: roughly 10 participants (50% academic, 50% industrial) were participating. Everyone really enjoyed all the new features of ROS2 (including Navigation2) and liked playing around with their Gazebo instances with a turtlebot inside. After we started to go onto the real hardware at more or less the same time the network crashed for everybody… I don’t think that this was the fault of a poor setup. It’s just hard to teach ROS2 in a 3 day crash course and then take a huge DDS configuration into account as well… But sadly after this the general opinion throughout academic and industry participants was just like this:

It really saddens me as I nevertheless will continue to push ROS2 in industrial environments. But especially those attending these courses were mostly sort of highly skilled and needed to report to their management if ROS2 would be an option for them. Again I don’t want to blame Frauenhofer IPA for this, as they did a really amazing job at preparing a really really cool workshop. But this is exactly the thing @gavanderhoorn pointed out:

Industrial: Nothing particular about wifi that comes to my mind right now. But I remembered that at least 2 research teams I came across last year sticked to ROS1 as they where more happy with the network-stack for their particular use-cases

1 Like

Ok, clear.

Then I agree with you to for now first build concensus and then start looking at how to tackle this in the best way possible.

1 Like

Good performance over WIFI and other kinds of “lossy” networks has always been one of the major pain-points that ROS 2 is supposed to address, and the use of DDS has been at least partially motivated by the promise that it would enable this. Nobody explicitly said “without major configuration hassle”, but I think this went without saying, as people were used to that with ROS 1.

Therefore, I very much support @smac’s initiative here and think easy to use, high performance networking is something we as a community need out-of-the-box.

My personal revelation came at ROSCon 2019 during the iceoryx presentation, when the presenters showed how badly image transport works without iceoryx (not at all smooth). I mean, iceoryx is great, particularly once you really start increasing up the throughput, but for a single image stream it should not be necessary, particularly not on localhost.

Since @gavanderhoorn has been asking for examples to test with, that would be my first suggestion. Now, I realize that “smooth image streaming” is not necessarily required for a robust and performant system, but it’s one of those “presentation” issues. People will notice it very rapidly when it’s not there. So, unless improving that reduces performance elsewhere, I think it’s a good test.

The other thing I noticed is that the navigation users seem to have more trouble than other people who may be using “simpler” setups that just stream a little bit of sensor data and a few commands (like many of my own simple test experiments).

Therefore, it may be that the number of nodes present is a limiting factor, at least as long as each node is mapped to a participant. This could also be translated into a test.

The last thing that comes to mind right now, even though its certainly not the last important thing, it seems as if discovery is a major issue, so we might want to look at performance after discovery and during discovery separately.

1 Like

Thanks @smac for raising the visibility of this long-standing issue and to others for chiming in. Open Robotics is completely in agreement with your proposition that ROS 2 should work well out-the-box in common wifi environments, using defaults. If we can’t match ROS 1’s performance or ease-of-use in that setting, without special configuration from the user, then we’re doing something wrong.

As our team wraps up their work for the Foxy API freeze, we’re assigning some people to specifically investigate the wifi behavior problem over the next several weeks. We don’t know yet what the nature of a fix will be, but we’re hoping to see some material improvement in time for Foxy. We’ll be working closely with the vendors as we go, and we welcome help from all of you!

4 Likes

Exactly my sentiment, that sounds reasonable. Thanks for taking on the action item to build a requirement and figure out then what the solution is. I am chatting with colleagues in Korea about running some experiments and I’ll get back to you or post here with relevant results from their experiences. They have easier access to complex corporate wifi network environments than I have in my 1000 sqft apartment.

To be clear, I don’t want to give the impression that “we’ve got this; everybody else can just wait for the fix to be released.” While we are investing time and effort in the issue, we still want help from everyone who can contribute. I’ll defer to @dirk-thomas to link to the relevant ticket(s).

Seems like a good time to start the middleware working group to take up lower level issues such as these. It’s very likely that this type of issue has been discussed in OMG meetings which means those members would have the best insights into how to find the right balance.

1 Like

Just a couple of cents more. You can expect a lot better experience out of the box in the Foxy release, as things have improved a lot in the meantime.

1.- Discovery over WIFI (@smac): A lot of improvements have been made and both the behavior and the defaults now are specifically tuned for this scenario.

Also, as I promise, here is the first article of a series of intensive experiments with the Raspberry Pi Farm:

Fast-RTPS Server-Client Discovery Analysis

This is available in Dashing, and optimized in foxy. As expected, it works very well for Wifi networks not supporting multicast. We are studying how to simplify even more the configuration required, but it could be as simple as specifying the master in ROS1: if you export an environment variable we could select that discovery mechanism, for example.

2.- Streaming and big data in the intra- process and inter- process case (@Ingo_Lutkebohle): This week we have released Fast RTPS 1.10 (already available), with a complete shared memory transport and optimized intra-process behavior. This feature is the first time is available in an open source implementation of DDS, and as expected decreased the latency and increases the throughput a lot for big messages, such as the used in video streaming.

We are now in the process of characterizing the performance, and we will publish some results soon, but as an example, for a message of 2m the latency decreases around 20 times in the case of inter-process.

This feature is enabled by default, so no configuration required, and it is available in all the ROS 2 supported platforms

3.- Benchmarking as part of our CI: We at eProsima are continuously doing a big effort to improve the performance and scalability having a dedicated team for performance and benchmarking tests as part of our CI. We usually add more scenarios and tests when customers or the community describes any performance issue, and we will stay tuned to keep improving.

6 Likes

Just to be clear: this version of Fast-RTPS is currently not being used by any ROS distro - not even master which will become Foxy.

Hi @dirk-thomas

Just to update this thread: Now shared memory transport will be available for Foxy.

Also, we have an updated discovery study here.

4 Likes

@Jaime_Martin_Losa thanks for the update!

btw, your latest documentation doesn’t seem to have the use-cases link anymore! So the links in this thread to the use-case documentation don’t work.

I had a look at the study and one thing struck me: You have 29 participants, with 10 endpoints each, which is a small network by robotic standards. And discovery traffic for this network, even in the very best case, causes ~40.000 packets to be exchanged, and close to 90.000 packets for whats currently the default case (serverless discovery).

No wonder the WIFI breaks down. Even without multicast issues, thats a lot of packages.

This really strikes me as ridiculous. Sure, if the goal would be a fully meshed network, with every participant talking to all the other participants, fine. That would be a lot of connections, and TCP would do worse to set it all up. But this is not what is usually happening. Most of our systems are very sparsely connected. The large majority of endpoints are only ever accessed by exactly one other participant. There’s a few exceptions of course (/tf comes to mind), but I would still say that the average is somewhere between 1 and 2.

I mean, ROS 2 fires up ~15 services per node just for parameters and the lifecycle! In most cases, nobody except launch ever accesses those. ROS 1 took advantage of this, nodes only asked the roscore for the topics and services that they actually needed.

There must be a way to take advantage of this fact to reduce discovery traffic for DDS as well.

3 Likes