I am curious to know of any experience people have had running distributed ROS in all shapes and sizes. Have you worked with any of the following solutions (or anything else?), and how did it go? What thoughts or advice can you share?
Static ethernet network with some kind of switch
Dynamic ethernet network with some kind of router
WiFi and ethernet through a router
WiFi-Only network with a common base station / access point
WiFi Ad-Hoc network
Some kind of mixture of anything else?
As for my personal interest, I have been doing some work with distributed ROS and UAV, and while it is all going well on the ground side, I’m having some severe spikes in latency going through a cheaper wireless router. I’m specifically interested if anyone has any ideas about reliable low-latency wireless links that are easy to interface with ROS.
a mobile robot with 2 PCs. One of these ran a OpenVPN server
wired to a router
that router bridged to a WLAN with multiple access points
the care home’s (in which the robot worked) network was configured to port-forward to the OpenVPN server
A control station some 60-70km away VPNed into the robot.
This resulted in a ca. 100ms ping time. With this, I could teleop the robot’s base and arms using the on-board Kinect, so OK data rate as well.
Lessons learned:
Tuning a multi-access point WLAN, where the robot roams between APs, is some work. Contrary to my intuition at the time, is that you do not want to have each AP at a high power but instead low. That way, the robot switches to the now-closest AP faster, yielding a better connection. WiFi is designed to stay ‘attached’ to the current AP as long as it gets a signal, even though it may be weak. Dropping a low signal early is better in that case.
Running the VPN server on the robot is not the way to go… The server should be on a public IP, the robot behind a firewall.
So I have worked with some rather large distributed systems and best practices are rather hard to say without understanding your topology.
I have had the best success with routed architecture. Each node master then uses a static packet structure to communicate to the higher level controller (mainly due to the radio link).
Your spikes are more than likely caused by the inefficiencies of your packet size vs your radio’s frame size. For a test run iperf across your radio link at different packets size. You should be able to see what that most efficient packet sizes are.