Issues about fleet_adapter_template implementation (#648)

Posted by @Villy01:

Hi, I’m working on a project using rmf_deployment_template. I’m using Ubuntu 24 LTS and ROS2 Jazzy. I’m trying to deploy an adapter for my robot fleet via fleet_adapter_template. I managed to implement a working adapter, but I’m encountering some issues.

  • At the start of the adapter, the robots are added to the fleet, they are assigned an ID, and then a task is launched to send the robots to a charging station. Is it possible to modify or remove this last operation? The problem specifically lies in the fact that the parking task is executed as if it were navigation to a point. The robot moves correctly towards that point, but it doesn’t connect properly to the docking station because it doesn’t approach in the right direction.
  • When I try to launch a task, for example, a delivery task, without selecting a robot, I can see from the logs that the task dispatcher starts the procedure to search for a robot and my robot sends its cost estimate, but then the task is never assigned. When I try to launch a task and manually assign it to the robot, however, it is executed without issues.

Posted by @xiyuoh:

Hi @Villy01 !

For your question regarding docking, there are 2 things you can do, depending on what your intended behavior is:

  1. You can set the finishing request of your fleet to nothing to have them do nothing as an idle behavior, i.e. before and after tasks they will not automatically go to a specific waypoint to park or charge. You can also choose to set the request to park, and provide the parking waypoint name (example for tinyRobot1). This ensures that your robot navigates to a location you’ve chosen, instead of the docking station, and park there when not performing any tasks.

  2. If you’d like to have it charging while idle, leave the finishing request as charge, and on your navigation graph, add a dock_name property to the charger waypoint. In your fleet adapter, you can implement a docking logic and trigger it for any waypoint with the dock_name property populated. We’ve implemented examples in our rmf_demos_fleet_adapter as well as fleet_adapter_mir. Let me know if you have further questions!

    Edit: I wanted to add that your robot will likely need to charge at some point anyway, so regardless of what the configured finishing request is, implementing a docking logic is the recommended way to go.

As for the task assignment issue, it’d help to provide some logs, videos/screenshots, commands, steps to reproduce, etc. in order for us to understand what’s going on on your end and debug accordingly.


Edited by @xiyuoh at 2025-03-24T15:44:32Z

Posted by @Villy01:

Thanks for the suggestion, I have resolved the docking issue. I wanted to temporarily ignore the task assignment problem and address it later because, because while trying to execute some tasks assigned directly to the robot, I realized that even though I correctly loaded a navgraph for the robot, it does not follow it properly. Moreover, it fails to correctly reach all destinations.

In an older version of the system that used ROS Humble, I did not encounter these issues. For this implementation, using Jazzy, I used the same reference coordinates, which I am sure are correct because they were tested with the previous implementation. Could there be issues with coordinate transformation in this adapter?

Posted by @xiyuoh:

I realized that even though I correctly loaded a navgraph for the robot, it does not follow it properly. Moreover, it fails to correctly reach all destinations.

Can you elaborate more about this issue, or provide screenshots/config/graph that you used? It would also help to explain what are the differences in your implementation when switching from Humble to Jazzy, e.g. are you using EasyFullControl for both? What are the new Jazzy features you’re implementing in your fleet adapter? As for the transformation library, it has been tested with physical robots and works for multiple deployments, but if there are any bugs I’m happy to dig deeper.


Edited by @xiyuoh at 2025-03-26T06:41:45Z

Posted by @Villy01:

When working on humble, i was using the Humble branch in this fleet_adapter_template, now that i’m switching to Jazzy i’m using the main branch. For the migration, I transferred the APIs I had previously written, testing them directly on the robot. I updated the config according to the new template, while the nav graph remained the same.

This are some useful file

I’ll describe and also send you the logs of the last experiment I conducted. Specifically, I started my infrastructure and deployed the adapter. The startup behavior is not always the same—the robot is supposed to move toward the charging base, but it doesn’t always do so. I had to restart the adapter three times to get the expected behavior.

When the robot successfully moved to the charging base at startup, I attempted to send the same patrol task four times. The first two times, the robot didn’t move at all, and the task was immediately marked as completed, even though it was received and interpreted correctly. The third time, the robot executed the task almost correctly, moving toward the desired destination while following the correct traffic lines. However, near the end, it didn’t reach the exact destination and stopped too early. On the way back, it didn’t properly return to the charging base, stopping near it but failing to position itself correctly.

The fourth time I sent the task, it was queued but never started. Here you can find the logs from the adapter pod that received these tasks. There’s also a screenshot from the dashboard, showing the four assigned tasks.

Posted by @xiyuoh:

Thanks for providing logs for the latest attempt to run the tasks, it helps a lot to digest what has happened so that we can identify where these issues are coming from. In summary it seems like we’ll need to make sure that the robot status (i.e. position) reported to RMF is accurate, it will help if you can have the robot position (in both the original robot coordinates and RMF coordinates) logged in the fleet adapter. I’ll address your concerns one by one:

The startup behavior is not always the same—the robot is supposed to move toward the charging base, but it doesn’t always do so. I had to restart the adapter three times to get the expected behavior.

Gotcha. I’m unable to visualize or understand what actually happened to provide an explanation or solution, could you take a video of the robot and/or provide logs when this behavior occurrs?

The first two times, the robot didn’t move at all, and the task was immediately marked as completed, even though it was received and interpreted correctly.

Looking at your logs for the first patrol task here, it seems like RMF processed several navigation commands within a single second. I’m inclined to believe that the robot API implementation needs further refinement. Can you share your RobotClientAPI script? In particular I’d like to take a look at navigate and is_command_completed, but being able to reference other parts of the implementation will be helpful too. It is possible that the command is marked as completed prematurely, leading to RMF sending subsequent commands and thinking that the robot has arrived at the final destination.

The third time, the robot executed the task almost correctly, moving toward the desired destination while following the correct traffic lines. However, near the end, it didn’t reach the exact destination and stopped too early.

This can also happen when the robot inaccurately reports its current location to RMF. Depending on the size of your robot, you might want to configure the max_merge_waypoint_distance in your fleet config, e.g.

rmf_fleet:
  ...
  responsive_wait: False # Should responsive wait be on/off for the whole fleet by default? False if not specified.
  max_merge_waypoint_distance: 0.1
  ...

The max_merge_waypoint_distance is the threshold for RMF to consider whether a robot has arrived at a location. If the robot’s distance from the target location is within this threshold, the navigation will be marked as completed. Units are in metres.

On the way back, it didn’t properly return to the charging base, stopping near it but failing to position itself correctly.

Similar to above, a video of this with logs would help immensely for me to analyze what happened here.

The fourth time I sent the task, it was queued but never started. Here you can find the logs from the adapter pod that received these tasks. There’s also a screenshot from the dashboard, showing the four assigned tasks.

Thanks again for the logs, however for the queued task that never started I don’t have sufficient information to go on. Were there anymore lines beyond the gist that you provided? If not, Can you try running this again and when it happens, take a screenshot of RViz or RMF web to show where the target location is, and where the robot currently is positioned?

Posted by @Villy01:

Thank you for the feedback! I’m sending you my RobotClientAPI files right away, both in the version I previously used on Humble and the new one on Jazzy. As soon as I have access to the robots, I’ll try to record some videos or gather more detailed logs.
humble
jazzy

Posted by @xiyuoh:

One thing I noticed about the way you’re retrieving nav_status is the lack of an identifier for individual navigation commands, which is likely the reason why your adapter is mistakenly reporting command completed to RMF. Consider this scenario:

  • You sent a task for robot to travel from waypoint A to waypoint C, and the path to get there would be A-B-C.
  • RMF will send navigation commands to each waypoint one by one: to A, then to B (after robot reaches A), and finally to C (after robot reaches B).
  • In this case, RMF sends the navigation command to waypoint A where the robot is very near to, and the adapter reports is_command_completed() = True to RMF.
  • RMF will then send the robot to waypoint B. However, since there are no identifiers for navigation commands to A/B/C, nav_status() will keep returning COMPLETED from the previous call for waypoint A. When the fleet adapter receives a navigation command to waypoint B, it would be immediately be marked as completed, even though the robot has not started moving.
  • RMF would continue to send subsequent navigation commands until the task is supposedly completed, but the robot is actually still at the initial waypoint.

It might be helpful to refer to the way we track navigation commands by creating a cmd_id for each navigate and execute_action command in the rmf_demos_fleet_adapter - it allows us to differentiate between navigation commands to the robot. If your robot API itself has some kind of identifier that is similar to MiR’s mission_queue_id (an identifier tagged to every mission posted to the robot), you can use that directly too.

Please update your adapter implementation and let me know if it works!

Posted by @Villy01:

I perfectly understood the case you described. I tried to implement a solution similar to the one you proposed, but the APIs of my robot do not handle the assignment of an ID for received commands.

In the two examples you sent me—the MiR example and the demo—the robots have this mechanism integrated into their APIs. Additionally, in the demo, there is also a fleet manager.

I was wondering, therefore, whether a system for managing IDs for each command is necessary for the adapter to function correctly, considering that it was not required for the Humble version.

And if it is necessary, why is there nothing in the template that refers to this?

Posted by @xiyuoh:

You are free to implement similar mechanism even if your robot API does not offer identifiers, off the top of my head the two approaches you can go with:

  • Track goal instead of an in-built identifier, and only mark navigation as complete if the new navigation goal pose matches (or is very near to) the goal returned by nav_status(), or
  • Track your own self.robot_status, and add a new enum IDLE = 5 under NavigationStatus to be used only within RobotClientAPI. With this you can toggle self.robot_status to IDLE once the previous command has been marked completed, then update it to MOVING when the robot starts performing the next command. You can use this status as a condition before returning True for subsequent calls to is_command_completed().

The main difference between Humble and Jazzy fleet adapters are the use and demonstration of the EasyFullControl library. While this library is compatible with Humble, we only officially released binaries from Iron onwards. The Humble demos fleet adapter also stores a current_cmd_id to ensure that we accurately track the progress of commands relayed to robots.

We don’t add this to the template because as you can see different robots offer different APIs that require their own logics; we wouldn’t want to restrict the way users implement their is_command_completed() callback. Managing identifiers is not always necessary, e.g. depending on update interval frequencies, some robots have their own IDLE status that is updated before the next navigation command is received, eliminating the need for such tracking. However, we do demonstrate with examples as shared above to showcase various ways to implement.

With that said, I agree that it’d improve clarity to highlight this in a comment within the template, and I’m happy to add a note in. Other than that, let me know whether similar or new issues come up after testing with these suggestions.

Posted by @xiyuoh:

Re: my previous comment

With that said, I agree that it’d improve clarity to highlight this in a comment within the template, and I’m happy to add a note in.

I notice the template already mentioned that the callback should return the completion status of the last command, so I won’t be updating with any additional note.

Posted by @Villy01:

Thanks for the clarification and the advice! I just had to slightly modify RobotClientAPI, saving the current_goal, to keep track of the last assigned command.
The only issue that came up is that in the dashboard, in the task section list, the task status changes from queued when assigned to completed when it’s actually finished, but it no longer goes through the executing state. Do you happen to have an explanation for this?

Furthermore, I still have the issue where, when I don’t assign the task directly to the robot, it makes an offer but is never selected. I also noticed that if I send a direct task to the robot after a while, it executes it immediately.

I’ll send you some logs from different pod. In this case, I started the adapter, created a new task without assigning it directly to the robot, waited for a while, and then sent a new task, assigning it directly to the robot. The second task started immediately, while the first one never got assigned to the robot. Maybe you can give me some insights. Thanks a lot for all the support!

Posted by @xiyuoh:

The only issue that came up is that in the dashboard, in the task section list, the task status changes from queued when assigned to completed when it’s actually finished, but it no longer goes through the executing state

The dashboard’s task status is updated from RMF via task state updates. You may try to do a ros2 topic echo /task_state_update -f and check what’s the reported status for the task when reflected as queued, executing or completed on the dashboard.

Furthermore, I still have the issue where, when I don’t assign the task directly to the robot, it makes an offer but is never selected. I also noticed that if I send a direct task to the robot after a while, it executes it immediately.

From your logs it looks like the Aunctioneer never processed the bid proposals. Could you provide some additional context:

  • What are the commands you submitted for both dispatch and direct tasks?
  • Did you set any bidding time window?
  • Did the dispatch (non-direct) task come up on the dashboard at all? Do check the output of /task_state_update for this as well while debugging.

Posted by @Villy01:

Could you provide some additional context

  • I submitted both tasks from the dashboard and i can see them in the tasks list
  • I set bidding time window to 2.0
  • I ran some tests checking the output on the topic /task_state_update, but I don’t see any messages published when I send a task. From the topic info, I can see that it has one publisher, /task_state_update, but no subscribers.

From the adapter pod logs, I can see that a request for a new task is received and a proposal with the estimated cost is sent. My concern is that this proposal might not actually be received by anyone, as if communication is happening in one direction but not the other.

[INFO] [1743169298.038880050] [reeman_big_dog_fleet_adapter]: [Bidder] Received Bidding notice for task_id [patrol.dispatch-0]
[INFO] [1743169298.039154188] [reeman_big_dog_fleet_adapter]: Planning for [1] robot(s) and [1] request(s)
[INFO] [1743169298.048491808] [reeman_big_dog_fleet_adapter]: Submitted BidProposal to accommodate task [patrol.dispatch-0] by robot [reeman_bigDog] with new cost [1743169359.653680]

In my infrastructure, I currently have only one robot, and I was wondering if this issue could be caused by the assignment process having some problems when there is only one robot.

Posted by @Villy01:

In my infrastructure, I currently have only one robot, and I was wondering if this issue could be caused by the assignment process having some problems when there is only one robot.

A small update: I tried adding a new robot, but even with two robots, the offers from the two adapters are not being received by the task dispatcher. I also tried to set the bidding time window to 10 sec but nothing changed.

Posted by @xiyuoh:

Due to hardware and accessibility limitations it’s taking some time for me to set up the deployment template, so please bear with me while I work on that to try reproducing what you’re facing.

In the meantime, can you add these arguments --ros-args --log-level debug --log-level rcl:=INFO to your task dispatcher node? They’ll be able to help us understand a little more about what’s going on with the Auctioneer. Here are some examples of the debug logs:

If successful,

[rmf_task_dispatcher-13] [INFO] [1743743718.235844326] [rmf_dispatcher_node]: Add Task [patrol.dispatch-6decb7ea6a] to a bidding queue
[rmf_task_dispatcher-13] [INFO] [1743743718.435105005] [rmf_dispatcher_node]:  - Start new bidding task: patrol.dispatch-6decb7ea6a
[rmf_task_dispatcher-13] [DEBUG] [1743743718.438631765] [rmf_dispatcher_node]: [Auctioneer] Receive proposal from task_id: patrol.dispatch-6decb7ea6a | from: tinyRobot
[rmf_task_dispatcher-13] [DEBUG] [1743743720.634312797] [rmf_dispatcher_node]: Bidding Deadline reached for [patrol.dispatch-6decb7ea6a]
[rmf_task_dispatcher-13] [INFO] [1743743720.634405817] [rmf_dispatcher_node]: Determined winning Fleet Adapter: [tinyRobot], from 1 responses
[rmf_task_dispatcher-13] [INFO] [1743743720.634432197] [rmf_dispatcher_node]: Dispatcher Bidding Result: task [patrol.dispatch-6decb7ea6a] is awarded to fleet adapter [tinyRobot], with expected robot [tinyRobot1].

If there are errors,

[rmf_task_dispatcher-13] [INFO] [1743743739.123752394] [rmf_dispatcher_node]: Add Task [compose.dispatch-29ca35c6fc] to a bidding queue
[rmf_task_dispatcher-13] [INFO] [1743743739.234308813] [rmf_dispatcher_node]:  - Start new bidding task: compose.dispatch-29ca35c6fc
[rmf_task_dispatcher-13] [DEBUG] [1743743739.234937093] [rmf_dispatcher_node]: [Auctioneer] Received 1 errors from a bidder
[rmf_task_dispatcher-13] [DEBUG] [1743743741.434297946] [rmf_dispatcher_node]: Bidding Deadline reached for [compose.dispatch-29ca35c6fc]
[rmf_task_dispatcher-13] [WARN] [1743743741.434558116] [rmf_dispatcher_node]: Dispatcher Bidding Result: task [compose.dispatch-29ca35c6fc] has no submissions during bidding. Dispatching failed, and the task will not be performed.
[rmf_task_dispatcher-13] [ERROR] [1743743741.434593926] [rmf_dispatcher_node]: No submission error[1]: waypoint name for Place [clean_lobby] cannot be found in the navigation graph

Even without the debug logs, we should have at least seen a No submission error if the BidProposal from your fleet adapter never reached the task dispatcher, so something fishy is going on in your deployment. We’ll also want to see if Bidding Deadline reached from the debug logs ever appears. If you made any modifications to the template, please list them as it would help me understand and replicate these issues you have.

Lastly, feel free to revert the bidding time window to the default 2.0, and the number of robots is unlikely the cause here.

Posted by @Villy01:

I tried adding the parameters you suggested, and it seems to me that the task dispatcher receives the robots’ proposals but never times out with the bidding window. Here are the logs.
I didn’t made modifications to the template.


Edited by @Villy01 at 2025-04-04T08:35:07Z

Posted by @xiyuoh:

I notice that the original deployment template’s use_unique_hex_string_with_task_id param is True, while both of your task dispatcher set it to False (here and here). Can I check again if you modified anything at all in the original template?

Posted by @Villy01:

I’ll send you my deployment file, but as you can see i leave that param to true, i don’t know why it is set to false in the pod, i’ll check. The only modifications i did were in the docker images pull, as i build them locally and i force to always use them.

edit:
I realized that I left the value of use_sim_time set to true in the values.yaml file. By setting it to false, I solved the problem, and now the task dispatch works correctly.

Regarding the value of use_unique_hex_string_with_task_id, the commented line about the use of ws connection is probably causing issues. By removing it completely, the value is correctly set to true.

Thank you so much for the time and the effort!


Edited by @Villy01 at 2025-04-04T10:01:45Z

Posted by @xiyuoh:

Hi @Villy01 , appreciate your patience, I finally managed to set up the deployment template and tested running tasks via the dashboard. Things are working well for me in simulation - indirect tasks are dispatched as they should, and the robots are able to perform them accordingly. Can I check a few things on your end:

  • What are your ENABLE_RMF_SIM and ENABLE_RMF values? Only one of them should be true
  • Are you running your deployment in sim or hardware?
  • When you’re facing these issues, what value did you set for RMF_USE_SIM_TIME?
  • How are you bringing up your fleet adapter pod? Are you
    a. Modifying this rmf-site.yaml, and replacing values for the rmf-tinyrobot-fleet-adapter app? Or
    b. Writing a new rmf-site.yaml file? If so, did you import any RMF_USE_SIM_TIME values? Can you provide this file in a gist?

Also, can you try out a clean set of rmf namespaced deployments with ENABLE_RMF_SIM set to true and ENABLE_RMF to false. This means the only rmf namespaced pods running should be keycloak- and web-related, and rmf-sim. Send a task to the demo simulated world via the dashboard. Let me know if this doesn’t work for you