[Nav2] Request for Comment -- Route Server

Hi all, your friendly neighborhood navigator here!

I wanted to gather some feedback / requirements / use-cases / needs on a project I intend to start working on early in 2023.

Project: Nav2 Route Server

Description:

This server will complement the Planner Server to generate global routes through the environment. Rather than using free-space planning, this will take in a pre-generated Navigation Graph to plan within to enable reduced-planning times and repeatable navigation behaviors. Some common applications of this are to (1) create directional lanes or robot lanes in spaces so that a robot is routed through only appropriate zones (2) route outdoors or in non-trivially non-planar environments where a occupancy grid isnā€™t a reasonable representation to plan within globally and accurate global height maps are not available (e.g. most outdoor applications), (3) plan within excessively large environments where free-space planning from the current pose to a goal isnā€™t reasonable by any current methods within a reasonable timeframe, requiring a reduced-representation to plan within.

Requirements:

  • Enable directional and non-directional edges in the navigation graph
  • Allow the navigation graph to be independent of generation method (e.g. PRM, hand-assembled, from automated algorithm based on visibility, etc)
  • Set weights for graph edges and/or run-time loaded plugin to score edges
  • Standardized navigation graph file format
  • Handle last-mile free-space planning to the goal pose from the navigation graph if the goal is not on a graph node
  • BT plugin and Python3 API access
  • Use optimal heuristic search algorithm and/or optionally Dijkstras / BFS (does anyone really want a plugin interface for different planners or just A*/Dijkstraā€™s fine?)
  • GPS and Cartesian coordinate storage
  • Visualize navigation graph / edge weights in rviz
  • The usual test coverage and documentation youā€™ve come to know and love about Nav2 packages

Iā€™m interested to hear from users any additional applications they may have for this server and any additional features or important items I should take into account as Iā€™m designing this server or evaluating existing options for adoption.

Happy routing-sometime-in-2023,

Steve

14 Likes

I think this is a great set of requirements. I have a few more to recommend based on our experiences with RMF:

  • Allow lanes to be manually closed and reopened during runtime
    • Bi-directional lanes can be closed along one or both directions
    • Useful for optimizing traffic flow based on conditions on the ground, e.g. have more one-way lanes going in one direction than another to accommodate unusually heavy traffic flow in one direction
    • Useful for routing traffic away from temporary hazard zones, e.g. maintenance work is happening
  • Provide the route server with occupancy data
    • Automatically close lanes that are occupied by obstacles
    • Alert that a robot needs to be rerouted when a new obstacle appears
    • Use free space planning to move a robot onto the navigation graph from its starting position (in addition to off the graph to its goal, as you already mentioned)
  • Allow speed limits (recommended or expected speed) to be specified per lane
    • Can be factored into the cost calculation, assuming travel time is at least one factor in the cost
    • Yields better predictions of travel/arrival timing
  • Consider how to support interaction events, e.g. open an automated door, use an elevator
    • For large facilities, the navigation problem is inherently intertwined with these interactions
    • We not only need to plan for these interactions, but also control the motion of the robot in sync with commands to these external entities, e.g. command the robot to approach the door, then wait until the robot is close before commanding the door to open, then wait until the door is fully open before commanding the robot to move through the doorway.

On the topic of search algorithms, Iā€™ve found that the easiest way to get good performance for navigating among static obstacles is bidirectional Djikstra where you cache and reuse the forward and reverse search trees. For navigating among moving obstacles, you can use the aforementioned static obstacle search as a heuristic with A* to guide you around the moving obstacles. That being said, Iā€™m investigating the use of Lifelong Planning A* and Safe Interval Path Planning as possible improvements over those techniques, at the cost of being more difficult to implement.

1 Like

Howdy!

I think this is a super interesting topic and I hope for a small contribution at opening the work that Toyota Research Institute, Open Robotics and Ekumen have been doing for years now. The project is called maliput and it aims to provide a reference SDK for road networks and autonomous driving simulation.

Most of the requirements listed in the description above are native features of the runtime API that the framework offers. The framework API offers two distinct frames, the inertial world frame and a lane frame (a non isotropic traffic lane bound volume frame) that allows agents to localize themselves and other agents in a semantic and meaningful space for navigation. There is also support for traffic rules which can be used to model the complexity of each local legislation and phase state transitions. Other features include traffic lights and related traffic rules, an object API, a built in plugin system for different backend implementations, almost 100% python API bindings by means of pybind11 and a ROS2 compatible build system (and deployment).

I am not familiar to the quality level of Nav2, but I can comment on the quality measures used at maliput and family members:

  • seeks 100% public API unit test coverage
  • gcc and clang compatible builds
  • integration tests and evaluation in customized simulator
  • periodically use of static analyzers

maliput has a handful of backends already. One is based on the OpenDRIVE standard, and there is another one (under construction, but not so far from completion) that integrates with lanelets and Open Street Map maps.

There is at the moment a naive routing API implementation on top of the road network that allows to create a path between two given coordinates. Simply, we had no time to prioritize the development of a powerful router that is capable of merging all the features (geometric constraints and logical constraints) based on a dynamic cost evaluation but it is in the roadmap.


About requirementsā€¦

This is an interesting requirement for your server. Certain agents may benefit from it while others may simply discard any information they get from it. For example, a parking zone may be or not a free space with no rules at all for navigation (even in different physical levels) and a global routing server could provide no information whatsoever about the area beyond its limits.

Also, I would suggest to also consider a built in coordinate conversion system (probably provided by one of the multiple tools out there). But it can certainly affect your path choice as path lengths become more important due to scale.


Iā€™ve done just a bit of research (scratching the surface only) in my master program of this topic (routing and flow optimization in traffic). When thinking of optimizing flow at the thousands of agents scale and more, itā€™s important to consider also type of agents and different cost functions for each of them (e.g. from the speed limit point of view: you donā€™t want to mix a car with a bike on a lane, they are better off in different lanes; from the curvature point of view, a small street may not be suitable for a truck but it could be suitable for a car) and so on.

Another interesting thing that I found is that systems of this sort can be used to optimize policies (traffic rules) and road geometries at the design phase. I wouldnā€™t be surprised if construction related research groups are using tools with many of your feature requests that already provide these capabilities.

Finally, I would like to mention that in the space of game theory and operations research, auction based servers are super interesting as agents may bid for occupying the space based on their own costs rather than deferring the planning task to a centralized entity. In those cases, the search and optimization tasks are deferred to the agents and servers may be used to hold the state (and also the future state estimation) of the graph and its costs and solve the bids.


I look forward to hearing other references and experiences in building systems of this sort.

Agustin

3 Likes

Hi,

Wanted to let this sit open for a minute before responding so that we can gather some ideas before I started to respond.

Iā€™m thinking a plugin interface to live augment the edge costs could be beneficial for this and other reasons.

I think this falls into the same category. If we have a pluginlib interface to score edges, that could be one such implementation to add an insane score if its occupied or provide an option to fully close it versus adding a numeric infinite cost.

Mhm, this is interesting. Iā€™m not sure how Iā€™d enable that cleanly, but something to consider during design. The idea of the route planner whoā€™s goal is to create a route through an environment (which isnā€™t even kinematically feasible) impacting the vehicle dynamics feels like an unnatural separation of concerns to me. With that said, we have a topic that the controller server listens to which will automatically adjust the speed limit of the controllers which is used in the Speed Filter, so it would be easy to implement. But then thatā€™s expecting the route server to be regularly running to update the speed dynamically based on the current edge instead of calling the route server 1 time to get a route to follow (or replan when required dur to blockages). The actual publication of the speed limit is trivial and I donā€™t have that much of an issue with that, its more the requirement that it embeds that we basically have to continuously evaluate the route server on a regular basis in order to adjust the speed limit.

For that reason, this feels like it better belongs elsewhere. But it might come to be in a real-life implementation it makes sense to have the route server be doing something in the background during execution. If that becomes the case, then it would be very natural to have the speed publisher there. But I donā€™t suspect that will be the case. Iā€™m not sure what the Route Server would need to be doing live during route execution.

Iā€™d like to hear some thoughts on this from other users - is this a valuable aspect? That feels more suited for the behavior design in the behavior tree, but I could see why the Router Server would also be a place of potential interest. That falls into the same category as the Speed Limits to me - that implies that the Route Server is doing stuff during execution as a long-running process instead of just providing a route on request.

Iā€™m not entirely opposed to that breakdown of the problem, but Iā€™d want to understand all of the ā€œthingsā€ one might want to do live during execution to see how to define an interface so that people can do what they need to do. Sounds like actions on entering/leaving a node (open a door) and edge (change speeds) would be good, at a minimum. The file format as well would become far more complicated if it needs to embed more than spatial information regarding nodes, edges, and their relative costs. This becomes more than just a Planner Server analog and becomes a pseudo-autonomy system which I feel is feature creep. But that doesnā€™t mean there canā€™t be an augmentation of the Route Server which does a bit more ā€“ it may be sensible to have the nav2_route_server contain a couple programs. One that does the more simple route finding analog to the Planner Server and another that uses that as a base implementation that also has a long-running server to dish out edge / node entrance / exit commands like described above. That would be a natural separation and let people ā€œtake what they want, leave what they donā€™tā€. Iā€™d probably start on the base first though - the file format bit seems like a headache if we want to enable ā€œanyā€ operation and not just a set of pre-defined operations.

Do you notice that these really matter all that much? Iā€™m thinking of a graph with ~1000 edges would still be trivial fast to search with an A* or Dijsktras, nothing fancy. Something like the Smac Planner does quite complex planning using generic A* over 100k nodes in well under 100ms - Iā€™m not sure the route server would even be remotely as complex and with far fewer nodes.

Most of the incremental / building-on-previous-runs planners fail to capture spaces that are not largely static well and reacting to the current conditions on the ground and getting to stuck into old path solution spaces. But Iā€™d be more than happy to implement a more interesting / complex search system if there are requirements for it ā€“ but if speed is the only concern, I think the base algorithms in our toolboxes are going to be plenty fast for us.


This work does not aim to support multiple robot systems for central or decentralized fleet management. This assumes weā€™re only caring about the behavior of a single robot. The technology in this field for multirobot planning / fleet management / traffic conflict resolution isnā€™t mature or generalizable enough for us to include into Nav2 at this time. Plus, companies that care about such things are going to be significantly better resourced and will ultimately design their own systems to optimize for their application spaces. There are still plenty of use cases for single robot planning spaces as evidenced by the number of folks using the existing planning systems in Nav2. Nothing in Nav2 precludes multi-robot-ing, but its not the primary driver of the algorithms/plugins we build inside of it today until the industry settles on a few common approaches to the problem that we can implement and support.

Hi!

At Kiwibot we have had similar needs (planning across large outdoors spaces where an occupancy grid is unpractical) and have been using a route server based on the Open Street Maps (OSM) framework. These are some of the points Iā€™ve liked about it that may be worth considering for nav2ā€™s:

  • If outdoors navigation is expected to be a common use case of the route server its nodes should be able take GPS coordinates as well as cartesian coordinates.
  • OSM stores all the information related to edges semantically in human readable key-value pairs, which makes it easier to grasp spatially what you may want to store on the map. All this information is then processed according to a profile that specifies how it translates to costs for planning in the graph. This would be IMO a better approach than directly defining costs on graph edges.
  • OSM offers a rich set of tools to visualize and graphically edit maps. rviz may already cover the visualization part for the edition I think users would prefer dragging and dropping nodes over their map (occupancy grid or GPS) and then manually connecting then using clicks rather than directly writing their map file on xml or yaml.

Though I think that OSM has too many features that may not be needed for nav2 IMO it would make a great starting point for its maturity and the long time its been around. There is also potentially a lot of tooling that can be reused if the framework is compatible enough.

1 Like

@smac interesting idea for this server. Will it solve for multiple concurrent robots?

We would use telemetry from robots, and IVA as input to ML for edge weight prediction. For this to work we need to populate weights at run-time, and weights may change over time (i.e. time of day, weather, congestion data reported from intelligent-video-analytics).

Supporting hardware accelerated graph solver(s) like cuOpt for route optimization as a plug-in would be good; this couples with having predicted models of graph weight costs above.

Will the Route Server run on robot, or on a Edge | Cloud platform running this as a service for multiple robots?

Thanks.

1 Like

Definitely, thanks. That was in my head but not on my sheets. Thatā€™s been added and updated on the post above.

Can you provide more information about this profile and conversion?

I think the idea is to provide an edge-plugin to be able to override or add additional costs to statically set costs in the file for the edge. Thatā€™s what would enable live-punishing of blocked corridors or other constraints folks brought up above. It would be nice to know what kind of information youā€™d like made available to you for an ML-based application to help in API design. Obviously you can pull in additional information from other sources / topics, but it would be good to know what from the server youā€™d like to know more than the edge / terminal nodes.

Noted!

Thereā€™s no particular restriction or requirement placed. It can be run where youā€™d like it to be run, but it will be run-able on the robot as a base requirement.

In OSM you can add human readable tags to graph edges or groups of graph edges, so the map itself stores information in something like this (its actually xml in OSM but Iā€™m using yaml for readability):

way1:
  nodes: [node1, node2, node3]
  type: sidewalk
  speed: 1m/s
  people_traffick: heavy

Routing engines usually take this map and translate it to a graph where each edge has a cost, usually expressed in the time the robot would take to traverse that edge. This cost is obtained by parsing the above tags following what they call a profile, which is basically a short script specifying how each tag should be pondered for calculating the traversing time of each edge. It would look something like this:

if(sidewalk) add 5s to the time
if(people traffic is heavy) speed is going to be half

See a real profile here. Though that its more complex it operates under the same principle.

I find this approach more user friendly than setting raw numeric costs on edges

I see, does OSM have people_traffick kind of tags that youā€™re using, or expanding on the standard with things of your interest that you put into the route engines with a parser for your custom fields (or the profile)?

I have no doubt that it will be necessary to have a file format for this work, but the question is open as whether we can adopt something existing or should create our own (or not even limit it - have some required tags like nodes/positions/etc but then additional tags can be created to your heartā€™s desire that your custom EdgeScorerPlugin can fetch to score based on your custom fields. The core server wouldnā€™t have to know or care about them in particular, just communicate to the CostAnalyzer plugin to use what it wants)

I do have some work from several years ago which extends OSM for more mobile robotics-centric navigation semantic representations, but I was hoping to avoid pulling that work in as a precurser to the Route Server work (but that may become unavoidable). The (silent) plan was eventually to migrate over to that, but I wanted to get something out first before getting bogged down in standards. Building ā€œthe thingā€ can often help in understanding your requirements for standardization better vs making something in a vacuum.

OSM has some standard tags but you can define your own and you are not forced to use theirs, except perhaps for the coordinates and an uuid for each node.

OSM uses mostly xml files, though it supports json and other formats as well. If users are expected to interact directly with the map files I think something like json or yaml would be more readable, however in OSM this is rarely the case as they have a lot of GUI tools for editing maps. That would be very useful for nav2ā€™s route server as well.

I personally like this idea and I think thatā€™s how OSM routers commonly work. You are never expected to change the core functionality of the routing algorithm, just somehow provide it with a way of interpreting your custom information in the map

1 Like

It would be interesting to set a static policy file that it could use, but I think the plugin method would be more general and within the same kind of ā€œthemeā€ as ROS developers expect with plugins. Also, I think it would allow you do more dynamic things like checking costmaps or other information sources to do more complex analysis than could be supported by a file of metadata.

Ah, so its mostly just XML with a couple of required tags. Iā€™ll look later in the required elements. My plan was to use XML anyway so this would probably be the path of least resistance anyway if it comes with other tooling or might be easier for others to integrate their internally developed systems with.

1 Like

I could see why the Router Server would also be a place of potential interest.

This probably falls more into designing the map markup to integrate information like ā€˜navigating this edge involves interacting with an external systemā€™.

hat implies that the Route Server is doing stuff during execution as a long-running process instead of just providing a route on request.

This seems inevitable assuming the route server is monitoring route execution, re-routing around blockages, etc.

ā€“

Weā€™re working on something similar, and I think itā€™s likely our use-case is fairly application specific. One thing Iā€™m sure can be common ground is how route information gets tied into the existing global- and local-planning pipeline.

This is great to hear that there are plans to implement this into Nav2! We will probably be interested in transitioning over to this rather than using our own system if it meets our needs, and of course will be willing to help in ways we can. Here are a few points relevant to our application that I imagine could also be relevant to others:

  • The RFM we are using requires that each robot gives real-time updates of the nodes they pass as they reach them. In order to provide this feedback, it seems like there would have to be some sort of monitoring going on that is aware of the graph structure. You have suggested that you hope to not need to continuously evaluate the route server, but this could be one for motivation for considering some sort of in-route monitoring.
  • We sometimes need to command the vehicle to travel through a specific sequence of consecutive waypoints or edges, rather than allowing it to plan its own route using a search algorithm. Will there be a way to do this in the design you currently have in mind?
  • Would you only have the option of importing the graph from an xml, or would there also be a ROS2 service request (or something similar) for adding (or even replacing) edges/nodes to the graph.

_

The requirement that the navigation graph be independent of generation method seems to open the way for anyone to create something like to be used with the route server. I am curious if there are any plans to create a graph creator tool like this within Nav2, even if itā€™s just in RVIZ, or will it just be left to the user to figure out the best way to generate their own maps?
_

I second this thought. It could probably be done using NavigateToPose if there is some sort of interface for getting the pose of a node in a graph, but it might be convenient if it is baked into this (at least as an option, some users may just want it to fail if it is not close enough to a defined node.)

This work does not aim to support multiple robot systems for central or decentralized fleet management. This assumes weā€™re only caring about the behavior of a single robot. The technology in this field for multirobot planning / fleet management / traffic conflict resolution isnā€™t mature or generalizable enough for us to include into Nav2 at this time.

Iā€™d like to selfishly push back on this limitation since my whole motive here is to do exactly thisā€”allow the ROS2 nav stack to be directly compatible with multi-agent planning frameworksā€”and I believe the route server thatā€™s being discussed here can be an excellent foundation for achieving that if we just squeeze one simple interface requirement into it, which Iā€™m about to describe.

I had mentioned supporting a variety of eventsā€”e.g. passing through automated doors and using elevatorsā€”as part of the navigation problem. In RMF, one of such navigation event is ā€œwait for trafficā€. A multi-agent traffic planner can make a determination about where a robot should wait to avoid traffic conflicts and then monitor the situation on the ground to signal when the robot is clear to proceed. This is conceptually not too different from telling a robot to wait in front of a door and then signalling to the robot when the door is open so the robot can proceed. If we can encode this idea of waiting on events into the nav stack in a generic, extensible way then the nav stack could easily support optional multi-agent coordination in a way that is not intrusive at all. Iā€™ll start with this very rough diagram that represents a specific chunk of the nav stack within my proposal [source]:
RouteServer

I would propose this for the (minimal) output of the route server:

# nav2_msgs/Route.msg
nav_msgs/Path path
uint32[] checkpoints

The path field is obvious: a sequence of poses that the local planner should treat as goals. The checkpoints field would be an array of indices of path.poses where the robot should pause to wait for a signal that it is allowed to proceed. If checkpoints is empty then the robot can immediately traverse the whole path without stopping or waiting. The signal for permission to proceed at checkpoints could look like this:

# nav2_msgs/Clearance.msg
Header header
uint32 for_path
boolean[] checkpoints

E.g. If Route.checkpoints contains [1, 4, 7] then the robot may need to pause when it arrives at path.poses[1], path.poses[4], and path.poses[7] depending on the values in the latest Clearance.checkpoints:

  • [false, false, false]: Pause at 1
  • [true, false, false]: Proceed past 1 but pause at 4
  • [true, false, true]: Proceed past 1 but pause at 4
  • [true, true, true]: Proceed through all checkpoints

If a checkpointā€™s clearance is true before the robot arrives then the robot does not need to pause or even come to a stop at that point. If no Clearance message has arrived then the controller must assume a fully false array. We would also have the following constraints on the content of the Clearance message:

  • Clearance.for_path must match Route.path.header.seq
  • Clearance.header.seq must increment for each subsequent update (but can reset to 0 for each new value of Clearance.for_path)
  • The size of Clearance.checkpoints must match the size of its corresponding Route.checkpoints or else it is treated as a fully false array
  • The elements inside Clearance.checkpoints must not revert from true values to false values in subsequent updates. That means the node determining clearance must not issue a true value for a checkpoint until the robot is permanently guaranteed to have clearance at that checkpoint.

At the same time, the route server can publish a separate message describing the nature/purpose of those checkpoints so that some separate event handler node can watch the progress of the robot and handle relevant events. This separate message describing the checkpoints could be standardized, but it could also be a custom message determined by the userā€™s choice of a route server plugin. Example for a very generic message that could potentially be standardized:

# nav2_msgs/GenericCheckpoint.msg

# Key for how to interpret the description of this checkpoint,
# e.g. "door", "elevator", "traffic"
string category

# Description of the checkpoint which depends on the category, e.g.:
# * door: the name of the door
# * elevator: a json message describing the name of the elevator and floor of entry
# * traffic: a json message describing what other robots need to be waited on at this checkpoint
string description
# nav2_msgs/GenericCheckpoints.msg
uint32 for_path
GenericCheckpoint[] descriptions

The event handler node would listen for checkpoint description messages for the current route and track the progress of the robot along that route to determine:

  • What commands to send to doors, elevators, etc (and when to send them)
  • When to update the checkpoint clearance for the controller

Why are checkpoints handled by the controller server instead of being handled by the waypoint follower?

Thereā€™s some conceptual overlap between what Iā€™m proposing for checkpoints and what already exists for following waypoints, but I think these should be handled separately for the following reasons:

  • The input to the Waypoint Follower is decided at the application layer whereas the checkpoints Iā€™m proposing are inferred by the Route Server when finding a solution to incoming navigation requests
  • Checkpoints might require the robot to come to a stop or might not depending on exactly when clearance is given. For the smoothest possible behavior, the controller server itself should be aware of checkpoints and clearances so it can make quick decisions about what velocities to command.

Why arenā€™t these checkpoint events determined by a behavior tree?

My understanding (which I invite others to correct) is that the behavior trees in the nav2 stack are made by humans based on the desired behaviors for their application. I believe that concern is orthogonal to determining when checkpoints are needed, since checkpoints are inferred while finding a solution to the navigation problem.

Perhaps the proposed Event Handler node could accept behavior trees that describe how to handle the different checkpoint categories, but ultimately I believe that the problem of figuring out when/where/what checkpoints need to exist has to be solved by the route server based on information provided by the navigation graph.

Is there a precedent for this proposal?

What Iā€™m proposing aligns very nicely with the VDA5050 industry standard. While itā€™s true that a VDA5050 bridge was recently released for ROS2, I believe that bridge could be significantly simplified and improved by incorporating this proposal into the nav2 stack.

This proposal is also based heavily off of my experience in implementing traffic and event management in Open-RMF. The current implementation of Open-RMF suffers from a lot of unnecessary traffic stoppages that could be eliminated if the proposed checkpoint system could be incorporated directly into the controller server.

I think calling new instances of re-routing could be done from the behavior tree when an issue occurs, on a regular frequency, or other conditionals like we have for the planner server (e.g. fixed frequency, based on distance traveled, based on speed, based on analyzing the path for continual validity). But we could allow the Route Server to have more control over this fact and be a continual running action server. It would complicate the BT logic and determinism though for orchestrating behavior since planning is no longer event-based. I prefer the BT logic for triggering planning runs over internal server, but thatā€™s not off the table. I donā€™t like the idea of the route server replanning on its own schedule, though.

I was thinking it would be sensible to have 2 objects: The Route Server and the Route Analyzer (toy names, I make no promise they will be these exact names). The server does the route planning and such. The analyzer would be a separate server that would take the route and current state information and do the things like trigger callbacks when we pass a node or enter / exit an edge. However, thereā€™s no reason these need to be truly independent, the Analyzer could live in the Server to have in-memory access to the same information (but could also be launched separately if thatā€™s preferable due to the break down of the compute network). But that architecturally separates them for the ā€œdumbā€ server analog to the planner server and is a 1-param flip from getting instantiated in the Server to enable to ā€œsmartā€ active background processing. I think thatā€™s a good move to separate the concerns as well as provide the basic behavior profiles Iā€™m looking for and what others are asking for also being supported. :slight_smile:

How ā€œsmartā€ is this? Is this simply a question of proximity to the node or relative to passing path markers corresponding to that region? Iā€™m curious how sensitive you need this to be and how you balance that today with the practical considerations.

It was not in mind, but its on my radar now :slight_smile: That may be something worth discussing in more detail offline of this thread when Iā€™m sitting down and designing the interfaces. I donā€™t have a off-cuff answer for the moment that Iā€™m happy enough with to share.

Having a service to replace the graph is definitely on the table. That probably wonā€™t be something Iā€™ll make on the initial prototype, but its a trivial contribution that can be made shortly thereafter. There will be some graph object that a service would just modify or replace with the service request fields.

The checkpoints example is illustrative and Iā€™ll take that into account when doing initial design work that plugin interfaces can be made to exist which would allow you to do that. My goal is not to preclude behavior, but to have some set of X, Y, and Z behaviors which I feel are appropriate, generalizable, and sufficiently mature that Iā€™m committing to its maintenance long-term.

The bits below are to provide you my thought process and context, no negative feelings or opinions at all. Please read in that spirit that I want to make sure thereā€™s light shined onto why I think this way as a maintainer trying to provide a great user experience.


Thatā€™s not to say that others cannot use those interfaces and plugins to make the behavior theyā€™re looking for. Its just that multi-robot scheduling, planning, and coordination is a wide field and thereā€™s no single class of method that works for a broad enough distribution the user-base. The example you show is still largely single-robot planning with some business logic to stop to ask if continuing is cool - I think thatā€™s a totally doable thing here, but that doesnā€™t help the person trying to do centralized coordination of a massive fleet nor decentralized active traffic management when other robots come into their zones. When I say ā€œNav2 supports all types of robotsā€, its because I and the working group have spent years building new planners, controllers, and capabilities to support that cross-section of circular/non-circular, differential, omnidirectional, ackermann, and arbitrary kinematic robot base types.

I donā€™t want to be put in a situation where I open the can of worms by making a statement that ā€œNav2 supports multi-robot navigationā€ and Iā€™m becoming flooded with tickets / requests / complaints that I donā€™t support what they need (and on the surface is an absolutely reasonable capability request) and that Iā€™ve said something deceptive or untrue in their opinion. Thatā€™s a bad user experience and whatā€™s scientifically speaking referred to as ā€œa bad timeā€ for myself. I could easily see a world where if I committed to providing the level of behavior and cross-section of algorithms for multi-robot navigation as Iā€™ve worked hard to provide in other areas, that could consume the next 3-5 years of my active development time - and thatā€™s simply not our goals or aims at this moment. Iā€™m not saying it wonā€™t ever be, but isnā€™t today.

Additionally, the types of companies that are looking for multi-robot scheduling / coordination / planning are typically well-funded (due to being extremely profitable) with highly capable engineers working on such problems. A 1% change in performance of the fleet based on that multi-robot layer can result in millions of dollars of saved money for a client. As a result, I almost assume that a layer like that provides mere tutorial value for students and initial prototype value for startups since they will need to create something bespoke for their needs shortly after to compete in the growing market. When we work on adding a new significant capability to Nav2, its important to me that this is not a research-quality capability and this is ready for prime-time and can/should be used by companies in their production environments.

The problem that Open-RMF looks to solve is slightly different and more naturally lends itself to open-source development. Since weā€™re not trying to optimally coordinate a fleet but trying to handle discrepancies and issues between fleets, that indeed is something of interest and non-tutorial non-prototype level utility if made well. However the point above remains that I donā€™t want to make a statement about multi-robot-ing in Nav2, but that does not mean that I donā€™t think the stuff in Nav2 cannot be used in that way or that we should not consider those applications in our designs. But the plugins and specific developments to support it I believe should live somewhere outside of the main stack as to not draw in inappropriate conclusions from passive bystanders (e.g. navigation2_auxiliary , navigation2_multirobot, or part of Open-RMFā€™s organization). There are indeed intentional choices made already within Nav2 with an eye for needs of multi-robot users, but its notable that I never outright mention it. This is a very intentional decision until such a time when we can commit to providing and supporting for the long term the cross-section of common user needs in multi-robot fleet management within the Nav2 ecosystemā€™s core (multi-fleet coordination, centralized and decentralized fleet planning, fleet planning on graphs and freespace).


So with that context, please note that Iā€™m not saying Iā€™m going to throw out all multi-robot needs and particularly those that are required or beneficial for a stronger binding to Open-RMF. However, it wonā€™t be a first-class citizen within Nav2ā€™s provided implementations and at some point, we may need to make compromises which do preclude certain types of less-common multi-robot-ing in exchange for code quality / maintainability / utility for the primary user-base. From what you describe @grey , thatā€™s all 100% on the table and I see no reason this couldnā€™t and shouldnā€™t be considered for whatā€™s to come here. But rather than seeing a clearance feature builtin, you may instead find a nice plugin interface where it can be injected. I view integrations with Open-RMF as strategically important for Nav2 (and vise versa I imagine) to grow the pie of the ecosystem and maximize contributions.

I suppose in summary: plugins, plugins everywhere!

S

Thanks for that detailed reply. I fully understand and respect your position of preventing feature creep in the core of nav2 as its maintainer.

The exact goal of my last post was to propose an acceptable, minimal, general-purpose capability for nav2 that would allow third-parties (e.g. me) to introduce multi-agent coordination and infrastructure event handling into a nav2 system without needing to fork any source code (i.e. using plugins). If you believe that the checkpoint interface should be handled through a custom plugin interface instead of through a standardized message interface, then I have no objection to that.

The example you show is still largely single-robot planning with some business logic to stop to ask if continuing is cool - I think thatā€™s a totally doable thing here, but that doesnā€™t help the person trying to do centralized coordination of a massive fleet nor decentralized active traffic management when other robots come into their zones.

The state of the art for multi-robot coordination that Iā€™ve seen in the industry tends to have exactly what Iā€™ve described within the robot control layer: a path that the robot needs to follow thatā€™s annotated with information about whether it needs to stop at each waypoint. Iā€™ll refer again to VDA5050 which bears a strong resemblance to my proposal. Decisions about when and why those checkpoints switch from false to true will be based on some complicated algorithms along with centralized and/or distributed decision making, but the robot control layer will look the same regardless.

My point being, I do believe the checkpoint proposal will be of immense value to organizations that are implementing coordination for massive fleets. It would allow them to integrate their centralized or decentralized fleet traffic coordination system directly into nav2 rather than needing to work around the limitations of nav2 that are derived from it being scoped to a single robot.

In fact a goal that I have for the plugin(s) that Iā€™m planning is that all robots who use it within the same site will have near-optimal traffic coordination with each other (I say ā€œnearā€-optimal because sometimes optimality needs to be compromised a little bit for expediency). Iā€™ll be happy to host that plugin under the open-rmf org as a community contribution so that the core nav2 team is not burdened with fronting questions and complaints about multi-agent coordination.

Anyway, Iā€™m feeling optimistic that weā€™ll find an arrangement for these features that makes everyone happy. I appreciate your continued guidance as we figure out how to collaborate and contribute these enhancements to the community.

I think tentatively, that would be my plan. But you know how plans go when rubber hits the road :person_shrugging:. Typically, we add plugin interfaces where there is reasonable expectations of customization. It seems to me that for a clearance type feature of a given node in the route graph, there could be other similar, but not identical features needing slightly different messages for slightly different needs. But I think once a prototype structure is taking place, we can re-evaluate that.

I donā€™t doubt it has a common application in the traditional AMR sector. I would just hope that as the state of technology develops, weā€™re not restricting ourselves to this kind of approach. Iā€™ll admit though, this really starts to open up a can of worms I wasnā€™t looking to open (tying interfaces here to the controller server is a non-starter though - that will necessarily need to go through the behavior tree. Thatā€™s the arbiter of ā€œwhat do we do with what information when and howā€, Iā€™m not about to start letting the Task Servers talk to either other on their own behalf with their own natural timings with requests - thatā€™s the kind of mess that made move_base not suitable for commercial applications. Its not difficult though to add a few new BT nodes though and have BT XMLs setup for communicating that information to the appropriate servers at the appropriate times).

Though I suppose since weā€™re talking only about Route planning here and not freespace or hybrid planning, Iā€™m more swayed towards thinking it could be considered a ā€œcoreā€ feature as long as it isnā€™t the start of adding endless use-case specific items to the main ROS Message Interface. Iā€™ll noodle on this more later this week.

As well :+1:

Whatā€™s the story with what Open-RMF already provides?

Whatā€™s the story with what Open-RMF already provides?

Iā€™m not sure if youā€™re asking about features or about architecture, so Iā€™ll cover both, starting with architecture.

Architecture

In the current architecture of Open-RMF, a ā€œfleet adapterā€ exists in a layer above the robotā€™s software stack and commands routes for its robots to navigate based on incoming task requests and the current traffic situation. There are a lot of options for how someone could integrate a ROS2 robot using nav2 with RMF, but the solution that we provide out of the box operates by streaming navigation requests into the nav stack based on the routes that are calculated by the fleet adapter. The fleet adapter then tracks the robotā€™s state, waits for it to arrive at each waypoint, and then sends a navigation request to move to the next waypoint when ā€œclearanceā€ is available.

This approach has proven functional and ā€œgood enoughā€ for many use cases, but it has some drawbacks that Iā€™m trying to address with these proposals:

  • All task requests and application logic need to pass through the RMF fleet adapter, which makes RMF a lot less flexible than weā€™d like it to be
  • The robot needs to come to a complete stop at each waypoint, even if we already know that it would be fine for the robot to proceed

If we can have a checkpoint system within the nav stack itself, RMF would be able to operate as a sibling of the nav stack instead of above it, which gives users a lot more freedom for how they design their robot applications.

Features

In terms of features that are relevant to nav2, we have a distributed multi-agent path finding system that works similarly to the conflict-based search algorithm, but it operates on predefined navigation graphs (AGV-style mobile robots) instead of free space. As part of my broader integration efforts (not strictly related to the Route Server topic), Iā€™ll be extending this algorithm to also support free space AMRs.

Then we also have event handlingā€”e.g. open/close doors, summon elevatorsā€”implemented in the fleet adapter.

Those do indeed seem like sub-optimal behaviors. Where does your eq. of this route graph planning live for me to take a look?

How do you plan to accomplish that with the clearance nodes?

In the proposal above, I assume that would notionally be handled outside of the Route Server (as your diagram previously suggested)


An aside of design, is this something folks at OR- err- Intrinsic would be interested in collaborating with me on? I need to come up with a plan of attack first, but assuming that looks good to us, is that a point of intersection? There are areas here of your obvious expertise and seemingly interest. Iā€™m finishing up a project this week (Famous Last Words ā„¢) and will be starting to mosy on starting the first or second week of January.