Error codes in NavigateToPose/NavigateThroughPoses

Hello there!

I opened this ticket, but I think this is a better place to discuss this topic.

We have several use cases where we call a NavigateToPose action from a high-level system, and we want to execute other actions depending on the error codes from the NavigateToPose. My robot didn’t reach the goal because a path could not be created to the goal, or was it because the space was so cluttered that the controller could not move? Or is it…?

However, to my surprise, there aren’t any error codes like those in the planner, controller, behaviors, …

So, I’m proposing adding error codes to this one and NavigateThroughPoses. I’m thinking of this two options here:

  • Create new error codes relevant to NavigateToPose by “mixing” the error codes from the subservers.
  • Because the NavigateToPose action must have a planner and a controller, populate some relevant error codes from them.
1 Like

I think this is a really interesting topic that I’d like some broader community input on!

The fact that we execute arbitrary behavior trees makes this a little trickier to generalize “why” something failed at the high level, but luckily the work we have in place for handling individual server error codes can be used here (i.e. we know which blackboard keys belong to error codes & have access to the blackboard).

There’s a good question though what to expose and how. For instance, if we simply provide the error code from a navigation task server, then you’ll have a code like 503, but then the application code would need to know that the 500-range is from server “X” which uses a particular action interface where the 3rd error code means “Y”. It would require high-level application code to be super aware of constituent Nav2 servers and their implementations which feels problematic.

We could address that by creating some C++ Navigation object which wraps an action client for Nav2 and inside of it has that information stored to parse the error codes, so that it comes “included” for folks using Nav2 through that interface so that they don’t need to be aware of it.

However, I’m not sure if an application really cares about why a particular server failed, only that it was the controller/planner/behavior/etc server that did the failing. This is where I’d like feedback on what kinds of information folks would want and how you’d use it to figure out what the right detail and content of information is best.

What kind of errors do you want to know at the autonomy layer that’s actionable based off of:

  • Just failure?
  • The class of failure (controller, planner, etc)?
  • The exact failure (controller TF transformation failed repeatedly)?
  • Something else?

Thanks!

This is a very interesting topic. As behavior trees can be complex, there can be expected or auto-resolved failures using recoveries. So it’s tricky to determine which failures to report to the user.

As an example: A soccer robot that should find and then kick a ball:

<Sequence>
    <RecoveryNode>
        <FindBall/>
        <RotateLeft/>
    </RecoveryNode>
    <KickBall/>
</Sequence>

Let’s look at some error scenarios:

  1. There is no ball at all. Then FindBall will fail. User would expect an error message from that node.
  2. The ball is on the left, but the AMR is stuck and cannot move → so FindBall will fail and RotateLeft fails. User would expect only the error from RotateLeft (?)
  3. The ball is on the left. So FindBall will fail once and then, after rotation, ball will be found. In the next step, for some reason KickBall will fail. User expects that only KickBall error is reported.

So I think the BT node needs to be involved here, to collect errors and, only if the BT fails, report the relevant error(s) back to the user.

We could define the node error codes either in the XML (for simple nodes/errors) or, e.g. for nodes which can have different errors, they could report their specific error codes when they fail. Ideally these go to some .msg in
an interface folder so that the other components of the system know which errors to expect from the BT.

If I understood @ajtudela correctly, we’re referring to total failures, not just failures within the BT that were resolved. Rather, failures that even after BT attempts to fix are still failing resulting in an ABORTED action result back to the action client.

We have this already, for all of the Nav2 Action servers! :slight_smile:

I think we’re trying to discuss the specifics of the action: as the client using NavigateToPose or NavigateThroughPoses, for example some motivating questions:

  • what kind of failure information is actionable to your application / system?
  • How granular would be useful?
  • What are some examples of how you’d use that information to be actionable in your application?

Getting some context on how it could be used will help uncover the Types of things people want to use it for, then we can try to make that as complete and general as possible to provide back for use!

Exactly. We’re interested in knowing the main reason why navigation failed.

In our particular case, knowing if the failure came from the planner (because there is no path) or the controller (because the robot it’s stuck) is enough.

1 Like

The purpose of providing these errors should be so that some other system is able to use the information to inform its decision making; perhaps some high level system planner could get enough information to:
A) try a different planner
B) try a different path
C) explore more (depends on context)
D) go into some recovery mode / handle issue at different level

Based on this, I believe providing some error codes defined in the goal completion interface definition that covers these cases would be sufficient. Some errors (e.g. perception errors) will be difficult to understand autonomously.

Some ideas for error codes though: planner failure (general error), path failure (no path could be found), unexplored path failure (no path, but sparse environment), subsystem failure (e.g. robot not moving).