Hi ROS community,
We would like to open discussion and feedback for service load-balancing.
The Purpose
ROS2 service must be able to have enough resource to response all incoming quests. This is gonna be hard to do so on edge devices. For the same service name, the current implementation allows multiple service clients but only one service server. If there are multiple service servers for the same service name, each of these service servers will receive the same request from a service client and respond individually, which will result in incorrect behavior.
So, we want to implement that multiple service servers on the same service path to achieve redundancy and load-balancing
The Rough Design
The existing service client application and service server application can support this without code changes. They just need to remap the service path through parameters at startup. For service client, the service path is remapped like --ros-args -r add_two_ints:=add_two_ints/load_balancer
. For service server, the service path is remapped like --ros-args -r add_two_ints:=add_two_ints/load_balancer/server-1
, --ros-args -r add_two_ints:=add_two_ints/load_balancer/server-2
.
A new load-balance service node.
-
A service server proxy.
It is responsible for connecting external service clients.
service server proxy (e.g. add_two_ints/load_balancer) will start simultaneously with the node startup. -
Load Balance Policy.
Select a service proxy client based on the current load-balancing strategy. -
Service client proxy
When a new backend service server is detected (e.g. find add_two_ints/load_balancer/server-3), a new service client proxy will be started to connect to the newly discovered backend service server.
Basic execution process
-
Request callback in Service Server Proxy
- Get
writer GUID
,sequence_number
andserialized request data
(pointer), and put them intoRequest_Receive_Queue
. (Writer GUID and sequence_number can be gotten fromrmw_request_id_t
) - Notify the change of
Request_Receive_Queue
- Get
-
Load-balance thread
- Wait for the change of
Request_Receive_Queue
- Get
writer GUID
,sequence_number
andserialized request data
(pointer) fromRequest_Receive_Queue
- According to specified load-balance policy, choose which Service Client Proxy is used.
- Use the selected Service Client Proxy to send the serialized request data and get
sequence_number
. - Save the corresponding relationship to the table (When the Service Client Proxy receives a response, it needs to refer to this table to determine which service client to send the result to.).
writer GUID
andsequence_number
↔ Service Client Proxy andsequence_number
- Remove
writer GUID
,sequence_number
andserialized request data
fromRequest_Receive_Queue
- Wait for the change of
-
Response callback in Service Client Proxy
- Get
Service Client Proxy
(pointer),sequence_number
andserialized response data
(pointer), and put them intoResponse_Receive_Queue
- Notify the change of
Response_Receive_Queue
- Get
-
Forward response thread
- Wait for the change of
Response_Receive_Queue
- Get
Service Client Proxy
(pointer) andsequence_number
, and query the table to get the correspondingwriter GUID
andsequence_number
. Remove this corresponding relationship in table. - Service Server Proxy send
serialized response data
withwriter GUID
andsequence_number
- Remove
Service Client Proxy
(pointer),sequence_number
andserialized response data
fromResponse_Receive_Queue
- Wait for the change of
Load balancing algorithm
-
Round Robin
Allocate service servers sequentially according to the order of requests. -
Balance the number of requests
Send new requests to the service server with the fewest currently active requests. -
Balance response time
Send new requests to the service server with the shortest average response time.
I would greatly appreciate any feedback or suggestions.
Please feel free to share your thoughts.