Hi all,
tl;dr Finding a router with IGMP snooping stopped packets dropped due to discovery multicast over wifi.
So recently I’ve been hitting a whole bunch of problems trying to setup ROS2 foxy with multiple drones over wifi. In our system I have a node translating motion capture stream into a topic for each vehicle. Then we have mavros + px4 on the other end receiving the data.
I found that if I turned on the 3rd drone, the motion capture stream would very consistently start to drop packets. In some cases all packets would be lost for 1 second causing my drones to crash. I called this “morse coding” as shown in the screenshot below showing a live Foxglove view of the position data for the 3 drones.
Well after testing everything from capture output rate, #nodes, #topics, updating and switching to mavros2 instead of using ros1_bridge, changing ROS receive buffer sizes and none of it made any change. The most obvious way to recreate was to plug a drone into ethernet, and then switch to wifi which caused the drops to start immediately.
That pointed me to the router and wifi being a problem. I was already aware that multicast over wifi has the potential to fragment packets and cause a mini DDos (If somebody can explain this better, I would love to hear about it). But this is the first time i have definitively seen this in action. I was aware of a technology in modern routers called IGMP snooping where the router unwraps multicast packets to be sent on a normal TCP-like connection.
I borrowed a router (Asus GT-AX6000) with IGMP snooping enabled and it seems to have stopped the majority of the drops. In the following image you should be able to see when I enabled snooping and it eliminates the dropouts almost completely.
I still get drop outs every now and again, but they are at most half a second and happen infrequently - on average one every few minutes or so. But at least now the drones fly a lot more reliably. The ROS comms also feels more responsive (sometimes I had lag after pressing mission go for instance).
Links to this topic too: ROS2 Default Behavior (Wifi) - #40 by srushtibobade , and also this topic about standard ROS2 being harmful to networks: Unconfigured DDS considered harmful to Networks
Hope this post is informative and helps some people out in the future Was an absolute pain to diagnose so thought writing it up might be informative!
Many Thanks,
Mickey Li