Map->base->odom as alternative for REP-105 recommended frame order

REP 105

Specifically looking at the structure laid out in REP 105 as discussed inserting the odom frame between map and base_link is valuable specifically because of the frequency and latency of updates that @Martin_Guenther laid out above. Full localization is much slower and higher latency than the odometry computations. By separating the two components of this and having the localization only compute the correction for the drift you can access the latest position of items in the map frame without waiting for the high latency localization updates.

A simple example of this being highly valuable is a vehicle that’s reliant on GPS for localization. When you drive through a tunnel you will lose your GPS signal. However if you are navigating through the tunnel the forks in the tunnel are in the map and consequently you want to use the last known localization + odometry to ask where the fork is compared to the vehicle. If you don’t chain the transforms and have them on separate branches then you get no updates about where you are in the map until you come out the other end of the tunnel. and reacquire GPS. You can program your navigation algorithm to deal with this be detecting that the localization is not updating. Then query at that older time, find your pose in the odom. Then query where you are relative to that pose now. Back compute your position in the last updated version of the map. And then compute the angle and heading to your desired landmark in the map now. This is a much more complicated way to solve the same problem and requires every implementation to do the same logic.

Publishing the correction transforms is not particularly hard. It’s T(correction) = T(map->base) - T(odom->base)

I would strongly recommend that this approach still be used regardless of whether the tree structure was a limitation or not. This structure was born from many experiences dealing with localization issues both in full size autonomous vehicle as well as indoor robots such as the PR2. It is not the simplest for teaching but it designed to be the most robust.

TF Tree Data Structure

Thanks @tim-fan for referencing the paper where a lot of this is already laid out.

This has been discussed several times before. One recent thread on this can be found at: Allow Multiple Parents at Top Level of TF · Issue #437 · ros/geometry2 · GitHub

There are several dimensions of efficiency that the data structure helps with as well as robustness. Queries of a directed tree are very efficient. The directed tree

Query efficiency

  • Queries in a directed tree are O(N) where N is the depth of the tree. There’s further analysis of querying the specific time in history but I’m going to exclude that as it’s independent of the data structure. If this was changed to be a generic poly tree this will kick up to closer O(N) where N is the number of nodes in the graph.

  • Maintaining the history of each frame is the most common operation and so it needs to be very low overhead. On receipt of new data it’s typically only appending to a history which is O(1), at worst case it’s O(n) where n is the distance into the vector history that came out of order.

  • Pruning is removing from the back of the list which is an O(1) operation too. Which is also done continuously in the background.

Unfortunately it is not just efficiency that relaxing the constraint will effect. You start getting into a problem of enforcing problems with cycles in the graph. There’s fundamental problems if you two notes publish transforms which cause a loop/diamond in the graph. There’s no way to consistently resolve a traversal through the tree unless you start encoding heuristics. Or you need to start creating priorities or some other mechanisms to select branches. Who should get the errors? Should queries fail and throw an exception if the tree has multiple paths through it?

Currently there is no requirement for the library to evaluate incoming transforms and the directed tree can be assembled at the time of query efficiently. The only way that the graph can be malformed is if a cycle exists which is clearly problematic, but also is quick to detect and generally relatively hard to do accidentally.

If we want to add support for a more complex graph structure you need to start thinking about how you’re going to communicate that more complex structure. Right now transforms only declare their parent, which produces a directed graph. If a transform changes parents at time T then a query to the system before time T it walks the old parent, at time >=T it walks to the new parent. If we don’t have that restriction we now need to keep track of breaking parent links explicitly. Which means that we’re going to need to extend the published data to include more information to explicitly break links as they’re no longer implicitly broken.

And as we start to get stateful changes with explicit starts and stops we need to make sure that our system is robust to running in a distributed lossy system for which not all messages are guaranteed to arrive. Nor to arrive in a specific order or sequence. So we need to start thinking about having a reliable transport or other mechanism to transport the transform information. TF is currently eventually consistent and any lost messages will only effect a very local effect.

The nice thing with the directed tree is that it does not need any maintenance and can be built dynamically while it’s queried but stays efficient. If you go to a more advanced graph to keep the operations efficient you’ll need to start maintaining auxiliary data structures to keep queries quick which starts accumulating computational costs. This is especially true as at any given time the graph may have a different structure you need to be maintaining parallel graphs any time that there’s a change in connectivity. And if the maintenance process encounters any problems with the graph who gets the error or exception? The only place that the feedback can effectively come out is to the querying process who’s query will fail, and potentially some global warning/error messages. And how do you deal with race conditions on building the graph and resolving errors in the graph? Data arriving over a higher latency connection might arrive later. What happens if a state change message is lost or dropped does the tree stay disconnected or connected? For how long is an old connection still valid and can be walked?

At a high level there’s basically a lot of other complexities that come in if you relax this constraint. I don’t think that they’re insurmountable, but I don’t have a clear picture of how all these additional complexities will benefit end users and worry that they will actually hurt end users with at minimum more computational load and potentially an even more complex interface which has to expose the higher complexity to the end user API.

The difference I see here is that to tf “clients” aka end users. They don’t care about the structure or direction of the graph. It will query it correctly whichever way you query it. The one requirement is that a tf publisher is required to keep track of any link. And “flipping” it is just publishing the inverse which is typically trivial if you’re using any linear math library. And as mentioned differencing before you publish a lower frequency transform is also relatively low cost. There are not many nodes which need to publish transforms for localization. And as you’ve demonstrated, the logic can be written completely generically in about 15 lines of code logic and embedded into any node before publishing the transform, including a fully parameterized generic one.

10 Likes