Many robotics companies need to be able to see their robots’ live video from afar, e.g., for tele-operation, monitoring, or debugging. However, implementing this well isn’t trivial, especially when you want to add that video feed to a web page and your robots are connected over 5g or customer wifi.
How do folks currently solve this? Please share your approach by selecting and/or commenting below.
ROS web_video_server
Sending individual images via a cloud proxy over web protocols (HTTP, web-sockets, etc.)
RViz via ssh-tunnel or VPN
Foxglove via remote websocket connection
Custom solution built in-house (please describe in comment)
I’ve done lots of testing and comparison of many of these solutions… It’s a bit of a learning curve but very hard to beat gstreamer. There is a new “webrtcsink” element that supports adaptive bitrate streaming using GCC although SCReAM is another algorithm which would be good to look at specially for 4g/5g operation. This works with vp8, h264 etc and will adjust resolution, compression and more to fit the available bitrate.
Motion JPEG stream, from IP camera to robot to server to browser, via some custom proxying and tunneling (the robots are on LTE).
We use MJPEG because it’s very low latency on our equipment, and easy to forward to a browser. It’s also a simple protocol, so it’s easy to hack if we need to, and it works on everything. If we push it we can get 12 frames per second with JPEG compression turned up. But most of the time we do 720p or 1080p @ 5 fps. Total latency is really low, usually below 150ms even over thousands of km.
WebRTC seems ideal, if encoding latency can be controlled. A lot of IP cameras’ h.264 streams are a second or two behind to start with. That, and it’s really hard to get an h.264 (or h.265) stream through a server into a browser.
I’ve heard of people using gstreamer but I’ve not been able to get a working demo yet myself. If it supports WebRTC directly now, I’ll have to give it another go. We also do 2-way audio via a separate system. WebRTC would hopefully allow us to combine the audio and video together in the same stream.