REP-105 has been around for a long time, but most people seem to get confused while learning the transforms map → odom → base_link.
Why does it confuse people? Well, according to REP-105, it opposes intuition:
The map frame is the parent of odom, and odom is the parent of base_link. Although intuition would say that both map and odom should be attached to base_link, this is not allowed because each frame can only have one parent.
The diagram below compares REP-105 to an intuitive transform relationship.
The intuitive relationship isn’t possible due to the usage of trees to store the transforms, and base_link can’t have two parent frames. (At this point, an alternative implementation of tf that uses graphs rather than trees seems to be ideal - I see some work being done for this in a stale transform_graph package but would like to know if it’s worth reviving this work).
When it comes to localization, packages usually calculate the transform from map to base, but then listens to the odom to base frame, and subtracts it to produce the map to odom frame. The following diagram is taken from the amcl wiki illustrating this:
The map to odom frame is accounted for as the Odometry Drift in AMCL. However in a localization system (that I’m working on) that can handle a kidnapped robot with no initial position estimate, accounting for this transform as Odometry Drift doesn’t make sense, and there seems to be no meaning to the transform itself.
A more intuitive (at least to me) alternative that gets around the tree structure issue would be to have the odom frame as a child of base_link, as illustrated below:
The advantage is that the localization node doesn’t have to perform any subtraction, and can directly publish its output. Both the map → base_link and base_link → odom transforms being published have a clear meaning.
I’d like to hear from the community, especially from those that have worked on localization in ROS, about this alternative frame order.
Hi, thanks for posting. I’m very interested in the transform graph idea.
My understanding is that a tree is enforced for efficiency reasons, however I don’t have a feeling for how much slower tf lookups would be over a DAG (directed acyclic graph) compared to a tree.
On the other hand I do have a feeling for how much developer time is lost by enforcing the rep 105 tree structure, as I’ve spent hours myself figuring out how to compute map->odom for a particular localisation system, and also some number of hours explaining how this all works to new ros users, interns etc.
Actually I’ve written a node to facilitate publishing map->odom (shameless plug), although it’s prefer if this didn’t need to exist.
I’m curious if others think a general DAG-based tf would be a viable replacement for the current tree-based implementation, or whether this would not be workable for efficiency or other reasons.
A DAG would support multiple paths from map to the robot so you’d have conflicting constraints on the robot’s pose. How would your proposed DAG-TF compute the pose in this case?
The reason of the order of frames in localization is the fact that in an ideal robot, map->odom would not change at all ever. In a real robot, it changes only with the rate of odometry error accumulating.
The localization algorithms usually take time, that is, our latest knowledge about where the robot is with respect to the map is always outdated. With Map->Odom_0->Base we can just add the current accumulated odometry and end up with the best current estimate without much “time travel” in tf.
It is maybe not intuitive to a beginner, but it’s the right thing to do. You have to consider the fact that frames have timestamps and things are moving.
@Kenji_Brameld: Thanks for the clearly-written post! You are absolutely right that this one is a classic roadblock for many newbies. I’ve been repeatedly explaining the map/odom/base_link idiosyncrasy to people for quite some time.
@tim-fan: Short clarification: The tf library is what’s enforcing the tree structure, not REP-105.
I agree that it would be nice if tf allowed more than one parent, because then the “intuitive” diagram from your figure was possible. However, a problem with arbitrary DAGs is that they would allow “diamond” shapes where there are multiple different paths between two nodes in the graph. What if there’s an inconsistency between the transformations, depending on which path we take? This is why tf must ensure that there are no cycles in the graph even if you ignore the direction of the edges. In other words, it must be possible to convert the graph to a tree by flipping some edges. The design choice by tf was that it only allows trees, and has the user flip the edges themselves.
I think the closest way to approximate the “intuitive” graph structure is to flip one or all edges, like you and @peci1 suggest. Actually there is nothing stopping you from doing that right now, if you control both the odometry and map localization nodes. All “consumers” of localization should only care that there is a transform between map and their frame of choice, not how the tf graph looks like.
However, there is one disadvantage of doing that. Suppose the odom -> base_link transform is published at a very high frequency (let’s say 100 Hz, which isn’t unusual), while the map localization runs at a much slower frequency (like 1 Hz). This is probably the most common use case. In this case, the REP-105 structure means that the combined map -> base_link transform gets updated at 100 Hz, while with any variation of the “intuitive” structure, it only gets updated at 1 Hz.
The map to odom frame is accounted for as the Odometry Drift in AMCL. However in a localization system (that I’m working on) that can handle a kidnapped robot with no initial position estimate, accounting for this transform as Odometry Drift doesn’t make sense, and there seems to be no meaning to the transform itself.
All localization systems that I know of that are currently in use in ROS can produce sudden jumps / kidnapped robot situations, and REP-105 explicitly mentions that. You’re right that in that case the name “Odometry Drift” is misleading, but the principle still works.
@TRuehr and @Martin_Guenther
Sorry, you’re right, a DAG would allow multiple paths and hence conflicting constraints, which would be problematic. I’m in total agreement that this should be avoided.
To revise my comment (and please correct me again if there’s some holes in my understanding of graph-theory):
Today, TF requires transforms to be published as a directed rooted tree. That this, no single node may have more than one parent.
I suppose I’m suggesting to loosen the constraint such that a general polytree will be acceptable.
Basically, it should be acceptable for a single node to have more than one parent, as long as this does not create loops in the (undirected) tf graph. If my odom system publishes odom->base, and my localisation system publishes map->base, this should not lead to any tf errors - there is no conflicting information here. My understanding was that the tf library throws errors in this condition only due to efficiency reasons - maybe it is quicker to perform the lookups if we ensure a rooted tree structure.
@Martin_Guenther point well taken about the advantage of having a map->odom->base structure, in that this allows map->base to be updated by dead reckoning in between localisation updates.
EDIT @Martin_Guenther I guess put simply I’m curious as to whether it’s feasible to have the tf library ‘flip the edges’, rather than needing to do it in client code.
Yes, a polytree seems to be exactly what I meant, I just couldn’t find the proper term! I think it would certainly be possible to add support for polytrees to tf. Any thoughts, @tfoote?
For clarification, in my original post, I mentioned that the current implementation of tf2 as a tree, but that seems to be incorrect since the definition of a tree(Wikipedia) is an “undirected graph in which any two vertices are connected by exactly one path”. The current implementation is a directed rooted tree as @tim-fan mentioned, which is not a type of tree.
A tree, seems to be a good way to store transform information. Why is the current implementation:
directed?
rooted?
If this is because of efficiency issues in lookups (i’d like more information on this if available), can the tree be rearranged to be a directed rooted under the hood (inside tf2) without being apparent to the nodes that are publising the transforms?
@Kenji_Brameld on the other hand, the wikipedia page for Tree(data_structure) shows a directed graph where each node has at most one parent. So I think by that definition you were correct in your original post!
Note section IV.A: Data Structures, which talks about the benefits of using this tree data structure:
To provide quick look ups the tree must be quickly
searchable. Limiting the graph to a tree enables fast searching
for connectivity. This becomes important as graph complexity
increases. As an example of a complex tree is the PR2 Robot
whose graph can be seen in Figure 2.
So my thinking that trees are enforced purely for efficiency reasons is based on that paragraph. Again, I’m not sure how significant the performance hit would be if we were to drop the strict tree constraint for usability reasons.
Specifically looking at the structure laid out in REP 105 as discussed inserting the odom frame between map and base_link is valuable specifically because of the frequency and latency of updates that @Martin_Guenther laid out above. Full localization is much slower and higher latency than the odometry computations. By separating the two components of this and having the localization only compute the correction for the drift you can access the latest position of items in the map frame without waiting for the high latency localization updates.
A simple example of this being highly valuable is a vehicle that’s reliant on GPS for localization. When you drive through a tunnel you will lose your GPS signal. However if you are navigating through the tunnel the forks in the tunnel are in the map and consequently you want to use the last known localization + odometry to ask where the fork is compared to the vehicle. If you don’t chain the transforms and have them on separate branches then you get no updates about where you are in the map until you come out the other end of the tunnel. and reacquire GPS. You can program your navigation algorithm to deal with this be detecting that the localization is not updating. Then query at that older time, find your pose in the odom. Then query where you are relative to that pose now. Back compute your position in the last updated version of the map. And then compute the angle and heading to your desired landmark in the map now. This is a much more complicated way to solve the same problem and requires every implementation to do the same logic.
Publishing the correction transforms is not particularly hard. It’s T(correction) = T(map->base) - T(odom->base)
I would strongly recommend that this approach still be used regardless of whether the tree structure was a limitation or not. This structure was born from many experiences dealing with localization issues both in full size autonomous vehicle as well as indoor robots such as the PR2. It is not the simplest for teaching but it designed to be the most robust.
TF Tree Data Structure
Thanks @tim-fan for referencing the paper where a lot of this is already laid out.
There are several dimensions of efficiency that the data structure helps with as well as robustness. Queries of a directed tree are very efficient. The directed tree
Query efficiency
Queries in a directed tree are O(N) where N is the depth of the tree. There’s further analysis of querying the specific time in history but I’m going to exclude that as it’s independent of the data structure. If this was changed to be a generic poly tree this will kick up to closer O(N) where N is the number of nodes in the graph.
Maintaining the history of each frame is the most common operation and so it needs to be very low overhead. On receipt of new data it’s typically only appending to a history which is O(1), at worst case it’s O(n) where n is the distance into the vector history that came out of order.
Pruning is removing from the back of the list which is an O(1) operation too. Which is also done continuously in the background.
Unfortunately it is not just efficiency that relaxing the constraint will effect. You start getting into a problem of enforcing problems with cycles in the graph. There’s fundamental problems if you two notes publish transforms which cause a loop/diamond in the graph. There’s no way to consistently resolve a traversal through the tree unless you start encoding heuristics. Or you need to start creating priorities or some other mechanisms to select branches. Who should get the errors? Should queries fail and throw an exception if the tree has multiple paths through it?
Currently there is no requirement for the library to evaluate incoming transforms and the directed tree can be assembled at the time of query efficiently. The only way that the graph can be malformed is if a cycle exists which is clearly problematic, but also is quick to detect and generally relatively hard to do accidentally.
If we want to add support for a more complex graph structure you need to start thinking about how you’re going to communicate that more complex structure. Right now transforms only declare their parent, which produces a directed graph. If a transform changes parents at time T then a query to the system before time T it walks the old parent, at time >=T it walks to the new parent. If we don’t have that restriction we now need to keep track of breaking parent links explicitly. Which means that we’re going to need to extend the published data to include more information to explicitly break links as they’re no longer implicitly broken.
And as we start to get stateful changes with explicit starts and stops we need to make sure that our system is robust to running in a distributed lossy system for which not all messages are guaranteed to arrive. Nor to arrive in a specific order or sequence. So we need to start thinking about having a reliable transport or other mechanism to transport the transform information. TF is currently eventually consistent and any lost messages will only effect a very local effect.
The nice thing with the directed tree is that it does not need any maintenance and can be built dynamically while it’s queried but stays efficient. If you go to a more advanced graph to keep the operations efficient you’ll need to start maintaining auxiliary data structures to keep queries quick which starts accumulating computational costs. This is especially true as at any given time the graph may have a different structure you need to be maintaining parallel graphs any time that there’s a change in connectivity. And if the maintenance process encounters any problems with the graph who gets the error or exception? The only place that the feedback can effectively come out is to the querying process who’s query will fail, and potentially some global warning/error messages. And how do you deal with race conditions on building the graph and resolving errors in the graph? Data arriving over a higher latency connection might arrive later. What happens if a state change message is lost or dropped does the tree stay disconnected or connected? For how long is an old connection still valid and can be walked?
At a high level there’s basically a lot of other complexities that come in if you relax this constraint. I don’t think that they’re insurmountable, but I don’t have a clear picture of how all these additional complexities will benefit end users and worry that they will actually hurt end users with at minimum more computational load and potentially an even more complex interface which has to expose the higher complexity to the end user API.
The difference I see here is that to tf “clients” aka end users. They don’t care about the structure or direction of the graph. It will query it correctly whichever way you query it. The one requirement is that a tf publisher is required to keep track of any link. And “flipping” it is just publishing the inverse which is typically trivial if you’re using any linear math library. And as mentioned differencing before you publish a lower frequency transform is also relatively low cost. There are not many nodes which need to publish transforms for localization. And as you’ve demonstrated, the logic can be written completely generically in about 15 lines of code logic and embedded into any node before publishing the transform, including a fully parameterized generic one.
Thanks @tfoote for your detailed answer. Some of the links that have been posted in this thread were difficult to find just from searching, so I appreciate you posting the links to those discussions.
inserting the odom frame between map and base_link is valuable specifically because of the frequency and latency of updates that @Martin_Guenther laid out above. Full localization is much slower and higher latency than the odometry computations. By separating the two components of this and having the localization only compute the correction for the drift you can access the latest position of items in the map frame without waiting for the high latency localization updates.
This is something I wanted to post in response to @Martin_Guenther earlier, but I waited until @tfoote’s response. Here’s my view on this.
Robot Localization should be about where the robot is now. With new odometry updates coming in, it should be the responsibility for robot localization to publish a new transform from map → base_frame when new information is available, or at a high enough rate to be useful to the user. It shouldn’t be publishing at the rate of slower sensors (ie. GPS, laser or camera).
I’m guessing when @tim-fan was referring to client code, he meant the node that’s publishing the transforms. There are not many nodes which need to publish transforms for localization, but there can be many localization libraries written for different environments and different sensors. How difficult would it be to put this into the tf2 side and not the node that’s publishing the transform?
That’s a wonderful ideal world. However unfortunately in a real world there’s latency in the data arriving from the sensors and you never know where you are “now” you can only estimate where you were at the time that the sensors sensed the data. And you can only do that after the sensor data has been transferred to the estimator. It is possible to build models and predict where you might be “now” however this requires extrapolating into the future and will have much greater errors and the estimate will be changed when the sensor data arrives.
It is actually quite difficult to encode into a distributed system how to semantically interpret the estimates in the precise way that any given localization algorithm’s solution has been computed. That’s why we have established the relatively specific standard of transform frames and REP 105 standards so that everyone can have the same understandings of the state of the system. Since there’s only a few nodes doing this they can implement the simple logic to correctly semantically interpret and standardize the object. If there’s enough demand for a helper function for the publisher to do the subtraction automatically and publish the correction transform that would be a welcome pull request. However I have not previously heard a request for that sort of functionality.
That’s not what I meant, I’m not talking about performing an extrapolation into the future with a prediction. What I mean is having localization publish the base_link → map transformation when the latest sensor information comes in (ie. the latest odometry or any other information it received).
I’m not too surprised considering the low number of localization packages that this hasn’t been implemented. Good to know that a PR is welcome for this.
I’ve seen that a lot of systems stamp the map → odom transform in the future (e.g. AMCLs transform_tolerance). I think this is done such that tf lookups of map → base_link will implicitly extrapolate the position of the robot when new odometry data is received. Shouldn’t this extrapolation be done by the localization system (for example AMCL). The localization system has a much better model for this. This would advocate for publishing the map → base_link transform by the localization node.
Actually it is the AMCL that does that. TF only accepts future timestamp, but the actual node that does that is AMCL itself. The motivation to timestamp in the future (as outlined in tf/FAQ - ROS Wiki, Q10 and Q18) always sounded a little bogus to me and it’s a hack to go around the side-effect explained in Q10. In my own localization nodes that I wrote, I never future-date the map-odom.
The purpose of the timestamp on any topic is to reflect how old is the information that lead to the quantity that the topic is publishing. In this case it’s the last known correction of the odom frame origin. So if the robot figures out its absolute position once and never again and remains on dead-reckoning forever, the timestamps in the transform tree should reflect that (map-odom should reflect last time the correction was learned, and odom-base should reflect the arrival of odometry information). For this reason it makes sense to have the map-odom-base chain in which each link reflects the information from different source (and you get one frame that is guaranteed to be continuous).
Putting a future timestamp on the localization estimate is indeed a dirty hack. If you set the tolerance too high, your robot position might reflect an older localization estimate. But as we have seen before, the map->odom frame should only drift slowly with odometry error accumulating so the error should be neglectable in reality. It could be avoided by adding some mechanism of invalidation of old tf messages or something along that line, but i guess the current way of just being a little “late” but smoothly interpolating might be prefered for control in many cases despite the theoretical error.
When you are talking about a better model, you mean amcl has a model of odometry drift, right? Ideally you would make your odometry error zero-mean through calibration before odometry goes into amcl. In that case, i don’t think amcl would have a better model.
That’s also not exactly possible. There are many different potential sensors that are being integrated with different arrival frequencies. You can do first pass odometry and compute a best estimate on a rolling basis. But if you then do something like close the loop in an SLAM algorithm you will get a high latency jump in your localization after you run a large optimization on potentially a large graph. If you don’t wait for the full computation or the slowest sensor you are effectively extrapolating forward based on partial information. This is why it’s valuable to separate the high rate incremental updates from the higher latency higher accuracy corrections.
(Apologies if this posts twice; I tried to reply by email, and apparently it was silently dropped. I’ll delete this one if a second appears.)
Bayesian filters, being Kalman or particle or whatever, always have interleaved
prediction steps (robot_localization/Xkf_localization in particular assume rigid-body Newton-Euler equations) which propagates uncertainty as velocity-type information comes in (the raw IMU or wheel twists underlying an integrated odometry stream), and
some form of Bayes rule, which multiplies this prior by the likelihood ratio of the newly-observed sensor data. According to both answers here, the real reason for the transform_tolerance is more mundane: it’s to account for the computing time taken by AMCL itself. I’m not sure, however, why they don’t just measure this and publish the right time themselves.
However, (1) and (2) aren’t really meant to be separate–internally, the localizer will be estimating a (distribution of) map->base_frame, and then subtracting out a odom->base_frame that is published by some other node (and which, yes, duplicates a small part of the work of (1)), before publishing a point estimate of odom->base_frame to tf (and optionally something more distributional like PoseWithCovarianceStamped on some other topic).
The (1) part will probably be running at a higher rate than the (2) internally, so the localizer could do the simplistic odometry integration itself and publish the odom->base_link tf. But it makes sense to keep it separate in case we want to use something fancier for that purpose (like another instance of robot_localization, or a running-mean filter, or an Intel T265 device).