REP 105 and the outputs of localisers

I was looking at REP 105 today to understand what a localisation node should produce for maximum compatibility with the ROS ecosystem, and I’ve come to the conclusion that we need to specify some more, either in REP 105 or in another REP.

In REP 105 section “Frame Authorities”, it says this:

The transform from odom to base_link is computed and broadcast by one of the odometry sources.

The transform from map to base_link is computed by a localization component. However, the localization component does not broadcast the transform from map to base_link. Instead, it first receives the transform from odom to base_link, and uses this information to broadcast the transform from map to odom.

This implies that transforms are published and consumed throughout a navigation system in ROS.

What is catching me up is that this is not consistent with what I’ve seen elsewhere in the ROS ecosystem or what I remember using myself over the years.

Although memories can be blurry and impacted by a lack of caffiene, looking around a bit showed that:

  • AMCL in ROS 1 outputs PoseWithCovarianceStamp and a tfMessage transform
  • The navigation2 stack’s AMCL does the same thing
  • The robot_localization package takes in PoseWithCovarianceStamped and TwistWithCovarianceStamped and produces the same message types (I think; @smac will correct me if I’m wrong).

So what should localisation sources publish? Should they publish PoseWithCovarianceStamped, or should they broadcast transforms, or should they do both?

The catch with broadcasting transforms, as @cho3 pointed out to me, is that there is no covarience information, which makes it hard to do things like combine multiple sources and filter data.

Are transforms meant only to be broadcast by the end of the localisation chain, after all localisation data sources have been filtered and combined?

I’m not exactly sure where that comment comes from. Can you clarify? From the quoted REP 105, I think the wording is not ideal, its a little verbose for just trying to say: The transform from map to odom is provided by a localization (or SLAM, mapping) component. The rest of the words about how that transform is computed are unnecessary. Also the fact it doesn’t mention SLAM, only localization, is interesting. I’m not sure if that was an intentional exclusion or just wasn’t on the mind at time of writing.

Just because AMCL also produces a PoseWithCovarianceStamped, doesn’t mean it is required to be REP-105 compliant. You can be compliant and also then provide additional information.

Those are indeed 2 valid options of things for robot localization to fuse. In the situation you present where you’re using RL as a localizer, then RL itself is now the “localization component”, and not the inputs to RL. Its then on RL to produce the transformation to be REP 105 compliant. Regardless, the additional publication of other messages (ie PoseWithCovarianceStamped) are not necessary and shouldn’t be explicitly required for the minimum REP standard in my opinion.

To address your last 3 paragraphs respectively, and to summarize my opinion:

  • They should only publish transforms to be minimally compliant
  • If you’re fusing multiple sources of data together, using RL or otherwise, then that fused output is now the localization transformation source and that should be complient with REP-105, not the components that came into it.
  • Like in publishing addition information (like PoseWithCovarianceStamped), you’re welcome to also publish intermediate calculations for other frame transformations. However at the end of the day, a source should be providing the map -> odom transformation. The intermediate steps or additional things you do are application specific and with in the blackbox of a localizer as far as the rest of your system is concerned.

Separately, if you’re asking what I think a localizer should produce, I think that includes a PoseWithCovarianceStamped. But REP-105 is concerned with frame convention, not localizer conventions. If you wanted to add the requirement to publish a pose, that should be a separate REP since REP-105 is dealing exclusively with frames, usually in TF, which doesn’t take in covariance information. If you’re working with multiple localization components, fusing them, and outputting a composite localization component, I agree each of them should be compliant so they are independently valid as localization components themselves (to have reusability). However, as far as the block diagram of your system is concerned, the “localization component” is the entire entity of the individual elements going into the fusion method and then served to the system. So REP-105 applies to the entirety of it, which is really then just on RL, and the individual localization components do not need to strictly follow the existing REP (but should, if you don’t want to make trouble for yourself).

Summary of this opinion:

  • Probably, yeah we should make a new REP for localization / mappers to provide not only REP-105 frames, but also the pose with covariance on update for a potential upstream use.
  • Technically as far as REP-105 is concerned, if you use AMCL and a few other methods to fuse together, AMCL nor the other methods are localization components. Only the fusion of them is the localization component requiring to publish the frames.
  • I see no drawbacks for this but I’m not sure it should be strictly required. I think it should be an option called out with an explanation for why you might want to include the PoseWithCovarianceStamped. If you’re using a single algorithm, what you propose doesn’t matter to a user.

This is the crux of my question. You can produce a pose or a transform, with or without a covariance, and there are advantages to each, but REP 105 only allows one.

  • If you produce a pose, it’s in one defined frame so you don’t need to know about the TF tree until you are a consumer. The publisher doesn’t care about what its parent in the tree is.
  • If you produce a transform, you need to know where you are in the tree, either via configuration or querying TF2. This makes it more annoying for the publisher and harder to just shove a localisation source into your system and let the consumers decide what frame they want the information in. On the other hand it makes things more absolute in terms of knowing exactly what the message means, which is a bonus for debugging.
  • If you produce a pose, it’s easy to attach covariance. If you produce a transform, which REP 105 asks for, you can’t include covariance data which makes REP 105 only applicable to the end of a localisation pipeline, even though it may not be.

I guess the best solution then is that any localisation system should produce both, with PoseWithCovarianceStamped output being used if you feed its result into another localisation system, and the TF transform being used if it’s the end result? So that section of REP 105 only applies to end-of-pipeline localisation results.

In otherwords, this:

@DLu Any thoughts on this topic?

Also @tfoote.

I’ve realised I’m probably misusing the word “localiser” in the above posts. Everywhere you see “localiser”, think “source of robot position information”. I’m not just concerned with things that localise in a map, but also things like IMUs, GNSS sensors, visual odometry updates, and so on. Some produce an instantaneous pose in their coordinate system and some produce an offset since their last frame.

That drastically changes the scope of the discussion, and likely invalidates my commentary above.

With that said, all the examples you provided above have standard messages in sensor_msgs or geometry_msgs which already include covariance information.

I think if you want to expand the scope, we’d need to create some coherent narrative or theme that ties all of those sensor / derivative / localization / motion estimators together. I think that might be a challenge. I would restrict it to localizers. When you start talking about things that just compute changes in pose, it becomes unclear whether that should apply to the map->odom or odom->base_link transform. And in fact, they could be applied more conventionally to the odom->base_link problem.

And in fact, they could be applied more conventionally to the odom->base_link problem.

+1

The main link in your transform tree defined in the REP should be your final fused best estimate of the combined result that’s published as the tf trasform. The most robust localization will usually be a fusion of multiple sources with different uncertainties. If these are broken apart and structured in different nodes they will need to communicate that information through various mechanisms such as PoseWithCovariance messages. They can potentially provide even more information. A lot of these interfaces are often defacto provided by the first implementation. I agree with @smac a new REP to define a standard way to share different odometry or localization estimates would likely be valuable as we get more implementations and start to consider different domains.

If you’re providing an intermediate localization solution the transform output can be suppressed or often is published as a child/leaf transform for convenient debugging and visualization.

This is not excluding SLAM, The SLAM is simultaneous localization and mapping. SLAM is just one class of localization algorithm that happens to be building the map at the same time.

The goal of REP 105 is that users of the transform frames don’t need to worry about the choice of the implementers of the localization methods. It is the responsibility of the developer setting up the localization to make sure that there’s the appropriate end product that’s easy for downstream developers to be able to rely on generic frames. The main point of the REP is to avoid the problem of “let the consumers decide what frame they want the information in”. The consumers have a list of frames with specific semantic meaning that are defined in the REP. If your implementation provides an additional frame with different semantics that’s great and people can use that. But if it’s not in the standard any code written against that will not translate to other implementations. As the implementer you can add any complexity you want or need. The ability to “just shove a localization source into your system” should not change the output that downstream developers expect.

The majority of consumers of transform information do not care about anything other than that they have the best estimate possible of the odometry or localization available. And the REP just defines specifically that output.

Indeed the majority of fusion right now is focused on the odometry layer usually merging, wheel encoders, IMUs, and potentially visual odometry. There are different semantics of localization versus odometry but there are also a lot of primitives that overlap especially when implementing various SLAM algorithms. A REP to homogenize either or both would be valuable.

1 Like

I think this was the source of my confusion. REP 105 applies specifically and solely to the end product, and I was mentally trying to fit the entire pipeline up to that point into its model.

1 Like

I have nothing useful to add, but will loop in @automatom

It looks like there isn’t a lot to add at this point, but as long as I’m here:

The robot_localization package takes in PoseWithCovarianceStamped and TwistWithCovarianceStamped and produces the same message types (I think; @smac will correct me if I’m wrong).

The package takes in all sorts of messages: Odometry, PoseWithCovarianceStamped, TwistWithCovarianceStamped, Imu, and AccelWithCovarianceStamped. It outputs:

  • A transform from odombase_link, if the world_frame is set to odom, OR a transform from mapodom if the world_frame param is set to map. You can turn off the transform broadcast, if you want.
  • The filter is producing a full state estimate, including velocity and acceleration, so it publishes an Odometry message with pose and velocity, and, optionally, an AccelWithCovarianceStamped message.

To me, the only required output of a localizer or odometry node should be the transforms mentioned in REP-105. We just publish the other data because it can be more convenient for consumers to fire up a single topic sub than to instantiate a TF2 buffer and use that. It also lets you visualize things using rviz (yes, you can visualize the transforms, but you can’t persist N of them like you can with the other types). I’m all for coming up with (and adhering to) any new REPs that come out of this discussion.