Title: Guidance on Scaling Free Fleet with Zenoh: Robot vs Server Code Allocation (#618)

Posted by @Oussama-Dallali99:

I’ve been working on implementing Free Fleet with the easy-full-control + Zenoh version for managing a fleet of robots. My setup includes:
1. created the robotadapter (like the nav1 and nav2)
2. A fleet adapter that I’m currently writing to integrate everything smoothly.

I am now considering scaling this solution to a large fleet of robots and would like clarification on the following:
1. Scalability: Is Free Fleet with Zenoh (easy-full-control version) suitable for a large-scale deployment? If not, are there specific limitations or areas to be cautious about?
2. Code Allocation:
• What parts of the system (from Free Fleet’s implementation) should reside on the robot side, and what should be managed on the server side?
• Are there any best practices or architectural guidelines for splitting responsibilities between the robot and server components to ensure reliability and performance at scale?

I’d appreciate any guidance, suggestions

Thank you for your support!

Posted by @aaronchongth:

Hello @Oussama-Dallali99! Great to hear that you have started looking into this new implementation of free fleet and integration with Open-RMF! Let me try to answer your questions,

  1. Scalability: Is Free Fleet with Zenoh (easy-full-control version) suitable for a large-scale deployment? If not, are there specific limitations or areas to be cautious about?

One of the reasons Zenoh was chosen to be used in the new Free Fleet implementation was due to it being a well tested and benchmarked framework. For scalability, we believe Zenoh’s approach with the routers would be the best case for distributed systems, like mobile robots. Instead of managing our own DDS configuration, it would be much more beneficial for users to be able to configure their Zenoh bridges and router/s according to their deployment, to ensure the best performance.

The great out-of-the-box experience with ROS / ROS 2 bridges were a great bonus too.

Unfortunately I don’t have a good answer for scale in terms of how many robots can be integrated with a single fleet adapter, as there are a lot of considerations in terms of networking and deployment. Based on Zenoh Reliability, Scalability and Congestion Control · Zenoh - pub/sub, geo distributed storage, query, the brokered throughput is technically capable of extremely large numbers of robots (assume 100kb payloads, it can technically handle 50k messages, let’s say 20 messages per robot per second, that’s still in the scale of thousands of robots).

As for limitations, like all distributed systems, the closer and connected the machines are (instead of piping through the cloud and across the ocean), we generally get better performance. Of course there might be network outages too, where either a robot enters a lift, or a robot goes through an area with a weaker signal, in these scenarios, users are encouraged to configure the Zenoh bridges and routers to ensure that re-connections can happen as soon as possible. Open-RMF is designed to be robust against such flaky robot update conditions, so re-connecting for updates would allow operations to continue as usual.

Links,

• What parts of the system (from Free Fleet’s implementation) should reside on the robot side, and what should be managed on the server side?

See the architecture illustration in the introduction, GitHub - open-rmf/free_fleet: A free fleet management system.

Just the Zenoh bridges are expected to be running on the robot during operations, with the proper configurations to connect to the Zenoh router. This can be set up as a system process as part of the robot booting up (just like how the Turtlebot4 TurtleBot 4 Setup · User Manual launches a bunch of stuff at startup).

The Zenoh router and the free fleet adapter would need to run on the server-side, alongside the rest of the Open-RMF stack.

In the example deployment GitHub - open-rmf/rmf_deployment_template: This repo provides a template to deploy RMF, the Zenoh router and the free fleet adapter would be pods in the cluster (just make sure to set up the port accesses correctly).

• Are there any best practices or architectural guidelines for splitting responsibilities between the robot and server components to ensure reliability and performance at scale?

Unfortunately I’m not sure how to confidently answer this without more context. I’d say to always set up the Zenoh configurations based on the deployment site and networking requirements, and work off from there. If robots at different sites will never get in each other’s way, or occupy the same space, or share resources/infrastructure, it is always better to design those sites to be different Open-RMF deployments, rather than 1 gigantic one, as it almost always makes every aspect of the deployment easier.