Unconfigured DDS considered harmful to Networks

We’ve only recently started to look into incorporating ROS into our industrial automation applications, and we ran into a similarly “destructive” issue.

Our experiments have been with FastDDS on Foxy.
The particular scenario that we had was:

  • Windows PC with ROS publishers
    • Network interface with IP on our office network
  • Linux PC with ROS subscribers
    • One real network interface with an IP address on our office network
    • Various other network interfaces created by Docker/VMware with weird local IP addresses

Running this arrangement with default settings took down our office network and saturated our internet bandwidth. Networking is not my strong suit, but my understanding of what happened is:

  • Subscriber sees that the topic it wants is published
  • It sends ALL of its IP addresses to the publisher, as potential destinations
  • The publisher then attempts to publish to all of these addresses
  • The correct one gets through (so everything appears to work)
  • But the incorrect ones have no route (since they were for VMware etc on the sub pc), and so the packets are sent to the router, and out to the internet

In this case we were simulating data from four 64 layer lidars, hundreds of megabits, and it saturated our external bandwidth on and off for a few days until the problem was found (we all just thought the router was on the fritz or something).

As frustrating as it was to take down the whole office internet, the real concern for us here is that our clients are typically operating on limited-bandwidth networks (100Mb or less for control equipment) in remote areas, in safety-critical applications, so we need to exercise caution in the way we transmit data. In the past we have always used TCP as part of our approach, and we hoped that the default UDP implementation in ROS/FastDDS would still be suitable but obviously it wasn’t.

Here is the config (@nathanbrooks) we have been using during testing for FastDDS, but we would be much happier if there was a safer default, since if we accidentally forget to set the environment variable we can cause some serious trouble. (Yes it will normally be done in a script, yes we can put it in our bashrc so that if someone logs in to tweak things their terminal would be ok, but it just doesn’t sit comfortably).

FastRTPS XML
<?xml version="1.0" encoding="UTF-8"?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>udp</transport_id>
            <type>UDPv4</type>
            <interfaceWhiteList>
                <address>127.0.0.1</address>
                <address>192.168.104.42</address>
            </interfaceWhiteList>
        </transport_descriptor>
    </transport_descriptors>
    <participant profile_name="participant_profile_ros2" is_default_profile="true">
        <rtps>
            <name>profile_for_ros2_node</name>
            <useBuiltinTransports>false</useBuiltinTransports>
            <userTransports>
                <transport_id>udp</transport_id>
            </userTransports>
        </rtps>
    </participant>
</profiles>

I get that it’s the middleware’s problem, perhaps a sensible default there would be for it to pick a single network interface as the default (try to be smart about it), and it will either work or not, rather than appearing to work while causing chaos in the background.

10 Likes