[Nav2] Feedback Required: Potential Change to Default Behavior

Hi all, its your friendly neighborhood navigator here.

I wanted to get Nav2 user feedback regarding a potentially large change to the behavior of Nav2. Since the dawn of time (err I guess dawn of ros-planning/Navigation) there has been a consistent issue that I myself have run into at multiple organizations. There are countless ROS Answers questions on this topic and it came up as one of the feature requests to solve this problem with @mkhansen and co at Intel started off the project a few years ago.

That issue is path planning oscillation, whereas the planner’s replanning frequency is set to be high enough to refine the path during normal execution, but also high enough to cause replanning down different corridors with similar distances to the goal when starting. This causes the robot to stop, turn around, and try to track the path in the opposite direction, just to replan the original way, and so forth. This is due largely to multiple routes containing essentially the same cost to traverse and replanning too frequently to make tangible progress down a particular avenue.

A common strategy might be to reduce the replanning rate substantially, but that comes at the cost of not being able to handle particularly dynamic settings effectively nor reacting quickly to blocked passage ways.

Thus, from work with @TheLaplacian in Nav2, we’ve come up with a solution we think outright resolves this problem, leveraging the power of configurable behavior trees we have in the Nav2 project.

The tl;dr of this work is that we added a BT that will replan on a significantly reduced rate (15s), but in exchange for that we also check if the existing path continues to be valid (e.g. not in collision with a newly detected obstacle). This makes it so that the robot won’t oscillate between similar path solutions in opposite directions because the robot will “commit” to its course of action for much longer before trying to again refine the path. But it has the ability to ad-hoc replan if strictly required – which is throttled to 1hz so that if you’re in extremely dynamic settings you’re not trying to replan every 10ms.

I think this might be worth evaluating on your robots and providing feedback to us as the maintainers. I believe this will resolve some of your behavioral issues and it would be of significant value if we could get consensus or feedback from users to know if this fully solves the concerns or if there’s more refinement we should do. It is our intent to make this the default behavior of Nav2 going forward so that no user will have to deal with the pain of path oscillation that many of us have had to make various solutions over time to resolve internally.

Any feedback is valuable, but areas I think in particular that would be good to know:

  • How do you like 15s for replanning? We picked it from a hat more or less knowing we wanted something between 10-30s. Happy to adjust if that is not appropriate.
  • Would it be helpful if we included not only “path invalid” check but also a “path is more expensive by X%” check based on experiential use, not just philosophy of “that sounds like a good idea”.
  • Any other thoughts / issues you run into

Happy behavioring,

Steve

4 Likes

Once a plan is selected it seems a reasonable solution to have an added cost to changing plans which prevents oscillation between relatively equal plans; or a new plan would need be lower cost by some minimum percentage and/or amount before changing plans.

2 Likes

I have thought about a similar schema before and I think even described it in detail on an issue tracker in Nav2 at some point (though I can’t seem to find it at this very moment). I wish I could find it to recall all of the context.

The major issue with that is many times a refined path during execution is going to be “better” in terms of some metric (distance from obstacles, smoothness, etc) due to new information and state of the robot during the path tracking task. Rejecting similar updates is not the purpose of this work.

Either way, we’d need to evaluate the current plan’s suitability before computing this metric, because if the current path is no longer valid, no matter what (probably) we need to exit tracking that one and replan. From a design perspective, once you acknowledge that, it becomes more adventitious to check for the invalidity event when it occurs rather than on a polling basis to have the fastest possible response time to conditions on the ground. Once that decision is made, it becomes less clear that a scoring function is really the right option since we’re operating on an event-based cycle rather than a polling one. While in practice we’re still technically polling, we’re polling at such a fast rate, it effectively event based versus being throttled by replanning times.

I think though that concept has merits and I’m still trying to fish in my mind the best way to merge the concepts together. Hence the

Which would add in the cost function for checking for validity. But I’d like to have folks test this in real environments first to tell me this is required before complicating what is otherwise a relatively straight forward, parameter-less, feature. Sometimes the simple solutions work unreasonably well!

This may not be what you are solving for, but in practice we run into a the robot is halfway down an aisle, and a stationary temporary obstacle is blocking it’s path (i.e. person, shopping cart, pallet loader, forklift), hence path invalid. The cost to complete another plan is mathematically lower than the current plan, however if the obstacle were to move to allow passage, the current plan may be better.

What we are trying to prevent is the unnatural behaviour where a robot on plan A, replans to plan B because the path is temporarily blocked, only to return to plan A when the path is cleared; this wastes energy, and is awkward behavior to a human in the operational domain.

Thanks

This is 100% a real, but different, behavior that is on our radar. We have the opening salvo to that issue in https://github.com/ros-planning/navigation2/pull/2802 which is still under works. It doesn’t outright solve that issue like the work described in this post hopefully fully solves that issue, but it moves the needle forward.

I’d be grateful if you wanted to continue this discussion in a Nav2 thread and we can see if we can’t come up with some ideas that would solve that as well. Right now my thinking is to have the robot “pause”, which is often a positive behavior for a robot in this situation and then try again after some time or an event occurs. I think the robot stopping to think is better than being a bit ADHD in continuously moving when it simply needs to see if the situation resolves itself.

The hardest part of this is deciding which replans are due to a situation where it would be positive to pause vs immediately replan and go on your way. To side-step that problem, the current PR for this issue solves a sub-problem of that general behavior in accomplishing it on approach to a final goal pose which is a more restricted context where it is generally good for a robot to pause since its 99% to the goal.

4 Likes

Here’s a video of the behavior tree in action from the discussion on pausing near goal if obstacles are in the way @ggrigor. It shows 2 examples from @Pradheep_krishna:

  1. “Dynamic” obstacle is in the way, the robot will stop, pause, and go around if it has not yet moved out of the way

  2. “Dynamic” obstacle is in the way, then moves out of the way, the robot continues on the original route after it moves out of the way from a pause

For the general solution for this type of problem, the primary thing is to find metric(s) that decide which replans are due to this type of issue in order to pause the robot (or always pause the robot if replans are too “significant”? That might not be the most ideal, but something to consider). This BT only does this when in close proximity to the goal since those are one such situation when pausing might be helpful if being asked to go around another direction. Finding a metric for “this path goes around another direction” is, I think, the most important thing, which might involve some analysis not only on the plan comparisons but the map itself. Perhaps computing a voronoi diagram on the static map to cache (or similar) to classify differences in “route classes”.

3 Likes

Bumping this topic – I wanted to give a second call for feedback on the initial post! We still have not received any meaningful feedback from users on these significant changes.

This somehow reminds me of the problem you see with path planning for six-axis robot arms. You can for some goals, depending on flipping some joints, either go a “left” or “right” way. I think that most path planners are hard coded to always chose the “right” way.
But this is dangerous half knowledge and only slightly relates to the Nav2 problem in this instance. Your described problem is occurring while the robot is in motion or only during initial goal setting?

Personally, a little bit too long. I would highly like my robot to be semi dynamic. If some people decide to jump around the nose of a robot it should not instantly answer and encourage such dynamical obstacles to further play with the robot. So some seconds wait seems to be right, but my guts would vote for 5s rather than 15s. Is this already a clear yaml launch parameter or is it hidden for most ROS users inside a behavior tree xml?

Absolutely! This reminds me a lot of Google Maps navigation. There are always alternative routes shown with “This route takes X minutes longer”. And there are often quite a few: “This route is equally fast”. But for the equally fast option, Google is just keeping the original route. So I would really only like a route to be switched if there is at least a 10-15% improvement over the original route. This topics becomes especially relevant once there might be a cloud cost map where multiple mobile robots are operating based on one common costmap. I am actually surprised that this is a relevant topic for the rather short vision of a single robot in a large environment.

Why not keep multiple BTs? I can imagine them ranked on the navigation2 documentation with a spider graph to showcase the individual strengths and weakness. But this only makes sense if through such a change other characteristics like a faster dynamic are reduced.
Of course, a default setting must always be set, but it might be convenient for advanced users?

Always, but most commonly when starting the goal. There’s certainly situations that it can be caused mid-execution, such as when dealing with partially or dynamically blocked passages like in a retail store or warehouse.

This will replan if the existing plan becomes invalid, so if someone does jump in front of it, it will replan. That process is also still throttled to 1hz so at worse case situation, you replan like you do currently. There is no “pausing” in this work, that was another off-topic discussion on another behavioral update that will pause the robot. See the original PR.

But yes, there is an XML field that you can easily swap 15 to 5 if you like.

There is / we do: https://github.com/ros-planning/navigation2/tree/main/nav2_bt_navigator/behavior_trees. This is default behavior we’re talking though.

Hello, on top of solving the issue you described, the proposed change might also work for an other issue that we encountered and solved in a similar way (in ROS 1).
This issue happens if the robot encounter an obstacle blocking a corridor, it will re-plan, get out of the corridor and when the obstacle gets out of vision it might plan again in the blocked corridor and so on.

So personally I believe this would be a good change.

We also just decided to back off the 15s replanning rate to 10s after doing a little further evaluation in simulation. 5s we feel is too short for the behavior we’re looking for, but 10s is a good middle ground

FWIW, we’re not using the default BT anyway, so there is certainly no negative impact.

Regarding making that the default, I’m all for it. It seems the more reliable approach. The only small downside might be that you can’t show off how “smart” the robot is by removing an obstacle from the shortest path :wink:

That said, regarding having this as a long-term solution, let me offer an observation. I think what we want is to be able to say “once you made a plan, stick with it unless there is a really important reason not to”.

The proposed solution doesn’t implement this directly. We are just hoping that whatever replanning period we chose, it will be long enough such that the progress made along the original path will ensure that the robot sticks with it. This strategy has some assumptions – for example, we’re assuming that the environment model doesn’t change drastically in those regions we currently don’t see, and which presumably contain the “other” path option. This is a reasonable assumption for the default setup, but I’ve seen setups where robots exchange environment information amongst each other, or get their map from a central server. These would invalidate that assumption.

Again, this doesn’t speak against making your proposal the default. The people using those kinds of setups are likely experts who know what they’re doing and are probably not using the default anyway, or if they are, they know how to cope with issues.

However, what I want to get at is that currently, the nav2 stack doesn’t support what we want directly. And I’ve been wondering what would be involved in making this kind of statement be possible. Maybe have some kind of history, or be able to detect how significant a path change is (and possibly then using a larger threshold, like a hysteresis). Just some food for thought.

2 Likes

+1. This is aligned with the intent from our experience in factory | warehouse environments.

Thanks.

Absolutely, I’m making these for 1 robot’s data for 1 robot task. The multirobot case is definitely different in a number of aspects beyond just the BT so I do assume that multi-robot users are ripping out algorithms and BT configurations for more appropriate options. We may eventually want to explore direct multi-robot algorithmic/configuration support, this is outside of the scope of the current task.

Thanks for the input though, that’s valuable context!

Precisely. Also since the world does realistically change over time, we don’t want to plan 1 time and just assume everything is exactly how we left it since the last time we did mapping. The casual / occasional replanning adds in the benefit of refining a path based on the most recent set of data, but on a frequency that would not create oscillations. The path being invalid (e.g. in collision) is the “really important reason not to” to trigger a replanning when required, so I think that part of the spec is the same.

In some professional settings, I think you could remove the replanning if you knew that things don’t change, but I don’t think that’s a good default assessment for every Nav2 user (education, research, startups, non-warehouses, etc). I think for the warehouse example that might be a very reasonable thing to do, but I would like reasonable out of the box behavior for everyone, which then can obviously be refined for specific tasks (especially at the professional level). In fact, that’s part of the motivation why we keep a library of behavior trees around to illustrate some ideas of modifications that users might want to be aware of.

I would like reasonable out of the box behavior for everyone

I would argue that this goal is not achievable – at least not with one BT.

That may be a strong statement, but I had to learn the hard way that there are some the practical situations that violate so many assumptions that you simply can’t accommodate them with the settings that are otherwise reasonable – and with something as flexible as BT’s, and the current ease of switching them out for one another, why should we?

One example is that, sometimes on-line planning is forbidden entirely. Not even once. Instead, there is a number of predefined paths, and robots are supposed to follow them exactly. When they can’t, they must stop until the path is clear again. To achieve this, we need a very specific configuration, also with respect to the DWB, which makes very little sense otherwise. Most people don’t want a robot that stops at the slightest disturbance.

The aim is to have a reasonable out of box behavior, I don’t think that’s at all an unachievable goal :wink:

I understand commercial users are going to need really specific behaviors based on their particular situations and they have the expertise to do it easily, but keep in mind that researchers, students, hobbyists, and early stage startups use this library as well. The better it works out of the box, the more invested they are to continue to use the system and submit new features over time rather than saying “this sucks” right out of the gate and giving up.

While for any exact behavior, users will need to create custom behavior trees (and probably nodes) and we make this easy via plugins and configuration files, it is always preferable to be function-forward and highlight a good user experience so that folks want to work with it.

Imagine if we didn’t provide any BT’s at all! Think of how few users it would have due to the technology hurdle of just getting started. I think there is significant value in having great out of box experiences for folks to later customize.

1 Like

I think we have the same understanding here. The default behavior should cover as many situations as possible in a reasonable way but it won’t solve everything perfectly.

1 Like

I filed a ticket around a more general behavior solution: Behavior tree: pause if 'different path solution' found before major local diversion · Issue #2877 · ros-planning/navigation2 · GitHub. I’d appreciate @ggrigor if you wanted to comment with your thoughts / experiences so that we can take that into account in design. Additionally, I’d love if you / your team wanted to get involved with Nav2 since it seems that your work is integrating with it!