I wanted to ask the mighty about some thoughts about low latency teleoperation of a robot. Here’s my scenario: Consider a robot with 3 video cameras. The camera stream with 30 FPS with HD quality resolution. The video streams needs to be rectified and send H264 or H265 via UDP to a control station with as few latency as possible.
So far I used gscam2 and image_proc’s rectify node to get a rectified image. I use raw image topics to save some overhead between the nodes. A third node uses CVBridge, OpenCV and GStreamer to encode the image and send it via UDP to a receiving GStreamer Pipeline on the control station. I noticed a latency of at least 300ms (glas to glas) in my setup and was wondering what could be conceptually improved:
Is it in general a good idea to process such an image stream with ROS2 in this scenario? Alternatively, I could also directly send it (of course unrectified) with Gstreamer and spare some overheads.
What could be improved on the ROS2 side? My sending node is written in C++ and I use hardware acceleration for the encoding.
Are the any examples for low latency ROS2 teleoperation scenarios? All my systems are in the same network.
You forgot to mention the most important thing: how is the camera connected? And what is the network layer (I hope at least 1 Gbps Ethernet).
For lowest latency, you’d use SDI protocol. We tested many last year (RTP, RTSP, MPEG-DASH, probably more), but only SDI could provide super low latencies. The problem is that SDI is proprietary and works mostly with Windows.
If you need to go the ROS way, I’d concentrate on the whole path of the image. Does the camera driver load it directly in GPU memory? Or could it? Does the rectify node work on GPU? (spoiler alert: image_proc does not). Can the encoder take the image directly from the GPU memory? In extreme cases: would it be possible to directly send the UDP packets from GPU memory to the network card via some DMA? Also, do you have good config of the encoder for real time streaming? That can make a lot. The same for the decoder.
Also, if the image is not read all at once from the camera, you could try rectifying it line by line as they come from the driver.
In any case, I fear that the publicly available ROS packages will not help you with this too much.
The network layer is 1 Gbps, I can confirm that. As for the camera connection, so far I tested it with cheap USB webcams (the better ones are still on the way). These will definitely also be responsible for a certain amount of latency, but I’m more curious, what I can already now optimize on the software side, while I’m waiting for the hardware. Cool, these hints are very helpful for further queries, thanks for that.
I’d like to go the ROS way, as once the images are within the ROS system, I’d love to further extend the robot from purely teleoperated to also partial autonomous, e.g. by using the camera streams for navigation or perception of obstacles.
How to get the video to the controller (usually a web browser) with low latency and accounting for bad network (bandwidth fluctuation, packet loss) and ideally without requiring a VPN.
How to bake safety into the system such that lag in receiving the video will automatically prevent control (which would be based on outdated situation awareness).
The issue in your setup is that it (presumably) doesn’t do congestion control (monitor available network bandwidth and reduce bitrate when necessary to avoid lag) nor packet loss mitigation.
I think WebRTC should greatly solve the problem of getting the stream to especially to a webbased controller app. Upstream of it, would you still recommend to go through ROS2? Where would you see the image rectification process with WebRTC?
200 ms was from USB camera to browser. But going through ROS should only be marginally more given that ROS image topics are just full image frames, i.e., producing these from the cameras does not involve processing of any kind that would require a jitterbuffer or similar that would add latency.
may I ask what resolution? Did you use any hardware acceleration? I saw that WebRTC can only decode H264 but not H265. Did you see any impact here? Can you give any recommendation to the onboarding processing before handing it over to WebRTC? I need to at least rectify the image and would love to got with image_proc, as it’s available of the shelf. I havent measured how much latency this adds, as I’m still waiting for the final hardware.
Resolution: doesn’t affect latency, but we’ve seen users stream four 1920x1080 cameras at the same time.
Hardware acceleration: yes, our implementation supports it (Nvidia Jetson/Orin, Nvidia GPUs, RockChip, and Intel VA-API) and I recommend using it, but it doesn’t affect latency either.
H264 vs H265: webRTC support either (plus VP8, VP9, anv AV1) but H264 is better for this application than H265. The latter gives you better image quality at the same bitrate, but costs more to encode and decode I think. AV1 would be best, but isn’t well supported by browsers yet.
Rectifying: using image_proc in ROS should work fine. Since, again, this is a per-image operation, it shouldn’t add much latency (my guess would be less than 20ms).
But you can try all this out yourself if you want. Our remote-teleop capability only takes a few minutes to set up and it automatically finds available ROS topics on your robot. DM me if you need help.