Handling Robot Breakdowns in Open RMF (#634)

Posted by @alex-roba:

I’m exploring how Open RMF handles scenarios where a robot breaks down in the middle of a task. Assuming the API can report such an event, how does RMF ensure seamless task recovery and reassignment? Are there built-in mechanisms for rerouting other robots or pause them, reassigning tasks, or notifying operators?

I’d appreciate any insights on best practices for handling this scenario smoothly.

Chosen answer

Answer chosen by @alex-roba at 2025-03-12T03:58:16Z.
Answered by @aaronchongth:

The feature you might be looking for is decommissioning a robot, see Stabilize commissioning feature by mxgrey · Pull Request #338 · open-rmf/rmf_ros2 · GitHub and Hammer/decommission by aaronchongth · Pull Request #920 · open-rmf/rmf-web · GitHub. Decommissioning a robot will let Open-RMF know not to let this robot participate in any tasks, until it is recommissioned.

a robot breaks down in the middle of a task

In the scenario that you have given, decommissioning the robot will also allow users to choose between a few behaviors,

  • re-assign all queued task for this robot to other robots, or/and
  • allow idle behaviors (previously known as finishing-requests, Open-RMF still continues to send robots to charging, parking, etc)

However a thing to note is that Open-RMF is unable to re-assign the current ongoing task of this robot, since it is already underway. Operators will generally perform this workflow in this scenario,

  1. decommission the robot, reassign queued tasks to other robots, and disable idle behaviors
  2. cancel the robot’s current ongoing task (since idle behaviors are disabled, the robot should just stop wherever it is)
  3. operators will fix up the robot
  4. once done, the robot can be recommissioned, and Open-RMF will start dispatching tasks to this idle robot now. (Queued tasks for other robots may also be shuffled and assigned to this robot)

Posted by @aaronchongth:

The feature you might be looking for is decommissioning a robot, see Stabilize commissioning feature by mxgrey · Pull Request #338 · open-rmf/rmf_ros2 · GitHub and Hammer/decommission by aaronchongth · Pull Request #920 · open-rmf/rmf-web · GitHub. Decommissioning a robot will let Open-RMF know not to let this robot participate in any tasks, until it is recommissioned.

a robot breaks down in the middle of a task

In the scenario that you have given, decommissioning the robot will also allow users to choose between a few behaviors,

  • re-assign all queued task for this robot to other robots, or/and
  • allow idle behaviors (previously known as finishing-requests, Open-RMF still continues to send robots to charging, parking, etc)

However a thing to note is that Open-RMF is unable to re-assign the current ongoing task of this robot, since it is already underway. Operators will generally perform this workflow in this scenario,

  1. decommission the robot, reassign queued tasks to other robots, and disable idle behaviors
  2. cancel the robot’s current ongoing task (since idle behaviors are disabled, the robot should just stop wherever it is)
  3. operators will fix up the robot
  4. once done, the robot can be recommissioned, and Open-RMF will start dispatching tasks to this idle robot now. (Queued tasks for other robots may also be shuffled and assigned to this robot)

This is the chosen answer.

Posted by @alex-roba:

@aaronchongth @mxgrey Thanks for the detailed response! The decommissioning workflow makes a lot of sense. However, I have some concerns regarding the behavior where the robot “just stops wherever it is.”

If a robot breaks down in the middle of traffic, other robots may continue operating around it, relying on onboard navigation to avoid collisions. While this prevents direct crashes, wouldn’t this still disrupt traffic flow? Wouldn’t it be more effective to pause nearby robots (or those on the same floor) until a human intervenes to remove the stalled robot, after which normal operation can resume?

Additionally, is it possible to automate the first two steps (decommissioning and task reassignment) through the fleet adapter as soon as a breakdown signal is received from the robot? Is there an API in Open RMF (or something like EasyFleet) that could facilitate this, along with triggering an alert for human intervention?


Edited by @alex-roba at 2025-03-12T03:45:25Z

Posted by @mxgrey:

To elaborate on why Open-RMF does not reassign tasks that have already been started:

We’ve found that there’s no generally sound way to move a task that has already started from one robot to another. Open-RMF would need an enormous amount of context on what it means to reassign the task. E.g. if a delivery task is reassigned but the payload was already picked up, what does it meant to reassign the delivery? Should the new robot go pick up a new payload from the same location as the original? Should the new robot wait until the payload has been manually moved to it? All of this will depend on the context, e.g. is the payload fungible or is it unique? Can a human intervene to transfer the payload from one robot to another?

Then when you start to consider the more general problem of reassigning any composed task from one robot to another, all of the above questions become combinatorial in the number of events that happen in the task.

Ultimately the system integrator needs to decide when and how reassignment should happen for tasks that have already started. For example you can monitor the progress of the task and then present different options to the operator for how they should handle a task that was interrupted based on where in the task flow it was interrupted.

Posted by @mxgrey:

Wouldn’t it be more effective to pause nearby robots (or those on the same floor) until a human intervenes to remove the stalled robot

Whether this is the best course of action is going to be situational, so we don’t encode this behavior by default.

However if this is what you want then you can roughly get it using mutex groups. If there’s an area where it would be problematic for robots to pile up, then mutex groups are a good thing to use in that area anyway. A robot that has been decommissioned will hold onto any mutex group that it had immediately before being decommissioned, so that will keep the other robots away from it. However this also means you’ll need a manual release for the mutex group on your dashboard. I believe rmf-web provides a widget for this.

is it possible to automate the first two steps (decommissioning and task reassignment) through the fleet adapter as soon as a breakdown signal is received from the robot?

Yes, the C++ API in rmf_fleet_adapter can be found here, and the Python binding is here. You could also integrate against the robot_commission_request API if you want to do a web-based integration.

Posted by @aaronchongth:

there has recently been an update to our documentation, and there is more information about mutex group usage now too,
check out Graph Strategies - Programming Multiple Robots with ROS 2

Posted by @alex-roba:

there has recently been an update to our documentation, and there is more information about mutex group usage now too, check out Graph Strategies - Programming Multiple Robots with ROS 2

@aaronchongth Sorry if this seems unrelated, but after reviewing the new documentation, I have a question regarding Task Waypoints vs. Holding Points. The documentation mentions that one way to differentiate them is by assigning a low speed limit to lanes connected to a Task Waypoint.

However, I also noticed that the Traffic Editor has a property called “is_holding_waypoint.” What is the purpose of this property, and is it related to the distinction between Task Waypoints and Holding Points?


Edited by @alex-roba at 2025-03-12T07:03:42Z

Posted by @aaronchongth:

A robot is allowed to visit a Holding Waypoint, during traffic deconflict scenarios, i.e. mentioned in the documentation, allowing robots to pass each other in long narrow corridors, by going to a side waypoint for a while.

A Task Waypoint is a waypoint where a robot will have to visit in order to perform a task, for example a dispenser or a lot for picking up a cart. To Open-RMF, these Task Waypoints are also waypoints along the way which can be used to let robots pass each other. Ideally we would only want robots performing that specific task to ever enter a Task Waypoint, and never use a Task Waypoint to deconflict traffic.

The solution provided regarding a low speed lane leading to a Task Waypoint, is to discourage Open-RMF to plan a path into those waypoints doing traffic deconfliction, and only use those Task Waypoints when necessary (the task requires to be done). A low speed on that lane, will incur a larger cost during planning, and if there is a Holding Waypoint nearby, Open-RMF will prefer to use that Holding Waypoint for traffic deconfliction rather than this Task Waypoint.

Posted by @alex-roba:

@aaronchongth I understand that, but my question is specifically about the “is_holding_waypoint” property in Traffic Editor. What is its purpose?

Posted by @mxgrey:

“is_holding_waypoint” property in Traffic Editor. What is its purpose?

It gives a hint to the traffic negotiation system about what points in the environment it can leverage to send a robot that needs to pull aside to let another robot pass. It’s not strictly necessary to mark holding points for negotiation to work, but in many cases it can make the negotiation process an order of magnitude faster and allow the planners to solve scenarios that might otherwise require too much graph search.

You can think of it as hint that improves the multi-agent path finding algorithm’s heuristic.


Edited by @mxgrey at 2025-03-12T08:49:21Z