Service Load-Balancing

Barry · July 30, 2024, 7:10am

Hi ROS community,

We would like to open discussion and feedback for service load-balancing.

The Purpose

ROS2 service must be able to have enough resource to response all incoming quests. This is gonna be hard to do so on edge devices. For the same service name, the current implementation allows multiple service clients but only one service server. If there are multiple service servers for the same service name, each of these service servers will receive the same request from a service client and respond individually, which will result in incorrect behavior.

So, we want to implement that multiple service servers on the same service path to achieve redundancy and load-balancing

The Rough Design

The existing service client application and service server application can support this without code changes. They just need to remap the service path through parameters at startup. For service client, the service path is remapped like --ros-args -r add_two_ints:=add_two_ints/load_balancer. For service server, the service path is remapped like --ros-args -r add_two_ints:=add_two_ints/load_balancer/server-1, --ros-args -r add_two_ints:=add_two_ints/load_balancer/server-2.

A new load-balance service node.

A service server proxy.
It is responsible for connecting external service clients.
service server proxy (e.g. add_two_ints/load_balancer) will start simultaneously with the node startup.
Load Balance Policy.
Select a service proxy client based on the current load-balancing strategy.
Service client proxy
When a new backend service server is detected (e.g. find add_two_ints/load_balancer/server-3), a new service client proxy will be started to connect to the newly discovered backend service server.

Basic execution process

Request callback in Service Server Proxy
- Get writer GUID, sequence_number and serialized request data(pointer), and put them into Request_Receive_Queue. (Writer GUID and sequence_number can be gotten from rmw_request_id_t)
- Notify the change of Request_Receive_Queue
Load-balance thread
- Wait for the change of Request_Receive_Queue
- Get writer GUID, sequence_number and serialized request data(pointer) from Request_Receive_Queue
- According to specified load-balance policy, choose which Service Client Proxy is used.
- Use the selected Service Client Proxy to send the serialized request data and get sequence_number.
- Save the corresponding relationship to the table (When the Service Client Proxy receives a response, it needs to refer to this table to determine which service client to send the result to.).
  writer GUID and sequence_number ↔ Service Client Proxy and sequence_number
- Remove writer GUID, sequence_number and serialized request data from Request_Receive_Queue
Response callback in Service Client Proxy
- Get Service Client Proxy(pointer), sequence_number and serialized response data(pointer), and put them into Response_Receive_Queue
- Notify the change of Response_Receive_Queue
Forward response thread
- Wait for the change of Response_Receive_Queue
- Get Service Client Proxy(pointer) and sequence_number, and query the table to get the corresponding writer GUID and sequence_number. Remove this corresponding relationship in table.
- Service Server Proxy send serialized response data with writer GUID and sequence_number
- Remove Service Client Proxy(pointer), sequence_number and serialized response data from Response_Receive_Queue

Load balancing algorithm

Round Robin
Allocate service servers sequentially according to the order of requests.
Balance the number of requests
Send new requests to the service server with the fewest currently active requests.
Balance response time
Send new requests to the service server with the shortest average response time.

I would greatly appreciate any feedback or suggestions.
Please feel free to share your thoughts.

chfritz · July 30, 2024, 2:30pm

Can you give an example of the kind of service you think might need this? I fail to see it. Robots are not usually meant to do any heavy lifting for outside entities, their resources tend to be stretched enough by their own needs.

You mention edge devices; is this perhaps meant for edge computing, i.e., a server that provides services to the robots? If so then I don’t understand why you want to use ROS for that, given, as you noticed, it’s not a good fit for this. Why not just use TCP/HTTP directly together with an existing load balancer like HAProxy?

Barry · July 31, 2024, 3:03am

Thank you for your question.

I currently don’t have any real examples. For edge device, HAProxy is a good solution.
I’m considering only using ROS. Currently, all computation in the robot is still provided through ROS services. As development progresses, we might find that too much computation causes the device in the robot to respond slowly. At that point, we would add new devices to provide the same services without code change.

Barry · October 21, 2024, 8:44am

The idea mentioned above has already been implemented, with some minor adjustments, of course.

You’re welcome to try it out and give your feedback.

Topic		Replies	Views
ROS 2 Service Load Balancing Next Generation ROS ros2	0	139	September 27, 2024
Composition and Parameters: Best Practice Suggestions Next Generation ROS	12	4715	January 10, 2019
Our (and your) plan for the ROS1 -> ROS2 migration Next Generation ROS ros2	6	3203	April 21, 2023
Interactive Markers in ROS 2 - RFC Next Generation ROS ros2	0	1951	July 23, 2019
ros2_grasp_service ROS Projects ros2 , humble	0	190	July 7, 2024

Service Load-Balancing

The Purpose

The Rough Design

Basic execution process

Load balancing algorithm

Related topics