Localization Architecture

Currently autoware does work this way, but really, a generic gps_localizer should produce a Position or Pose (depending on the gps hardware) in the “earth” frame using ECEF coordinates. Then the above mentioned EKF node can take this Pose and update the map->odom->base_link tfs accordingly.

This would also allow more flexibility when using autoware anywhere in the world (currently, I’ve heard stories of people having to fake their GPS data and pretend they are in Japan to use autoware without modification). All the user has to do is provide the earth → map TF and then you can run autoware anywhere you want. Last week I wrote a quick gps-only localization node (no EKF) to see if I could localize without ndt_matching at all (in case you are in a featureless environment). Autoware is currently operating just fine (still more testing needed though) using the methodology I described thanks to the de-coupling that TF trees provide (if everything operates in the map frame, it doesn’t matter where the map frame is). Hoping to bring some of this work into autoware, but we need to define this EKF node first.

I’m not following you on that, wouldn’t TR (turn-rate) from CATR be the same as angular velocity about z (or yaw)?. In the link you provided, is that just an example of a CATR motion model, or the one actually in use for tracking purposes? The lack of orientation (yaw) surprises me, wouldn’t we want to filter the direction of detected/tracked objects? Or is the orientation just passed through from object detection? I’ve never worked closely on object tracking code, so I may be over-estimating what is possible, I know detected object noise makes it hard to get good state estimations of objects so people tend to opt for simpler EKFs.

This is the first I’ve heard of the CATR acronym, and I’m definitely not a localization expert (it’s been years since I’ve needed to look closely at an EKF), but it sounds similar to what we’ve used in the past which was basically a really simple bicycle motion model with constant acceleration (this was for ego state estimation, not sure what we used for detected object tracking).

My bad @Ian_Colwell.

  1. turn rate is indeed the angular velocity.
  2. this package is indeed an actual CATR motion model implementation
  3. I was wrong regarding the state. In fact we use the exact same state struct as you above:Files · master · AutowareAuto / AutowareAuto · GitLab. Only that the orientation is represented through heading and not a full 3DOF orientation.

LMK if this clarifies it.

D.

Ah, that makes more sense, all I had previously looked at was the motion_model-design.md.
The state that you linked looks great for tracked objects!

For ego localization, I suggested a 3DOF orientation for a couple reasons:

  • We found pitch was useful in speed controls to determine if the car was on a hill or not.
  • I am unfamiliar with the current ndt_matching, but most lidar localization implementations work better when you have roll and pitch, since small roll or pitch angles can drastically change the lidar scan which is used for matching. I think ndt_matching estimates Pose with full orientation anyway.
  • Novatel SPAN devices (or most GPS-INS systems) provide a full 3DOF orientation, so we might as well use it.

One other difference I noticed is the autoware.auto tracker EKF does not estimate Z position. This is probably fine, the z value could just be passed through from detection. However, for ego localization, the Z position is definitely needed.

Thanks to everyone who posted in this topic! I definitely learned a lot reading through it.

That said, for Autoware.Auto, we’re working on updating the localization stack. In particular, for the upcoming hackathon in London (Sept. 25, 2019), we’re hoping to get NDT implemented. To support this, I’ve been working on a design doc for a localization architecture that should hopefully support multiple use cases and sensing modalities, and I would love to get some feedback on this.

The high summary is below:

The Autoware.Auto localization stack should be compliant with REP105[3], with the following frames defined:

  • /earth
  • /map
  • /odom
  • /base_link
  • Additional coordinate frames may be attached to the /base_link frame to represent sensor frames

A localization stack should have the following components:

  • Map manager (outputs /earth - /map transform)
  • Transform manager (outputs full chain of transforms)
  • Absolute Localizer (e.g. GPS, outputs /earth - /base_link transform)
  • Relative Localizer (e.g. NDT Matching, outputs /map - /base_link transform)
  • Local Localizer (e.g. visual odometry, IMU, outputs /odom - /base_link transform)

Mapping is considered to be a separate, but closely related concern to localization.

In addition, downstream components should make use of the following component, which understands the semantics laid out in this document:

  • LocalBufferCore (Updates /odom - /base_link transforms, permits /base_link - /base_link transforms across time)

With the following interfaces:

Depending on the use case, some subset of components may be needed.

If you guys want to take a look and comment on the full document, please feel free to do so! The MR is here: [AWF13] High level localization design doc (!90) · Merge requests · Autoware Foundation / Autoware.Auto / AutowareAuto · GitLab

As a side note, the way that the proposed Autoware.AI stack would fit would be as follows:

  • LiDAR/Camera → relative localizers
  • HD Map → map manager
  • IMU/Encoder → odometry (and/or directly to to a motion/state estimator)
  • GNSS → absolute localizer
1 Like

@cho3 Thanks for the proposal. Can you write it up (along with any revisions from the hackathon) and bring it to the next Autoware WG meeting?

@cho3 I’d like to see more information on the data flows across topics, especially with regards to data rates. And particular attention needs to be paid to how we will use TF. We should be using TF as much as possible (universally is ideal), but we need to ensure that we don’t have every node flooded with TF information. This means judicial TF usage design and figuring out how to set filters on TF subscriptions to ensure only the desired frames are received - using DDS filtering if necessary.

There is a potential opportunity here to improve the design of TF significantly in ROS 2 to better support complex and high-update-rate robots.

@gbiggs I broadly sketched out the data flows internal to the proposed localization architecture (in text; a nice picture would probably be ideal). Both internal and external to the localization architecture, the data flow very much depends on the use case.

Broadly speaking, the data rate could be anywhere from 5 Hz to 1000 Hz, depending on the sensors used (e.g. LiDAR only vs IMU with no filtering) and what kind of resolution you need for data alignment. If I absolutely had to come up with a range, I would guesstimate something like 25-60 Hz, maybe up to 100Hz depending on control needs. The transform resolution needed does vary from component to component, however, so that’s another good opportunity for time-based filtering (which I believe DDS does support).

Unless you start complicating the system architecture and have components that are responsible for the data alignment, most components that take in multiple topics will probably need access to some kind of tf, barring some strict architectural assumptions.

Finally, speaking broadly about filtering, I think the tf2::TFMessage message type is a good opportunity to use DDS keys (i.e. on frames)

I agree that the advantage of having both the odom frame and the map frame (conforming to REP 105) enables the opportunity to combine different use cases at the same time: For example road user tracking (odom) and global localization (map). A.f.a.i.k., the unintuitive TF tree, with a map-to-odom publisher, is enforced since TF consists of a tree structure: Any frame can only have a single parent. Multiple childs are allowed. However, having the map and odom frame as childs of base link seems unintuitive as well. I don’t know if there is any plan to change TF into e.g. a graph structure at some point.

If the autoware localization would comform to REP 105, that would also make it easier to combine autoware global localization with the robot localization package (or the newer fuse package[2]) for (smoother) odometry estimation.

[1] https://github.com/cra-ros-pkg/robot_localization
[2] https://github.com/locusrobotics/fuse

“I am orbiting the Sun and the Earth at the same time.”

If you put this into a tree (with the obvious hierarchy) the geometric transformations become obvious and it is possible to calculate a pose relative to a distant parent.

If you allow multiple parents, with both the Sun and the Earth as direct parents, then depending on which transformation you apply (i.e. which route you use to traverse the graph) it becomes possible for a robot to be in completely different places relative to whatever is above the direct parents in the hierarchy at the same time. You would have to set rules on things like ensuring all transforms via all parents are applied, what order they are applied, etc., and in the end you would have a graph loaded with rules that could ultimately be unwound into a tree anyway.

I’m sure Tully will correct me if I’m wrong, but I don’t see how a graph can work.

I see. Yeah, a graph will result in ambiguity if there are multiple paths that can be traversed to get from A to B. If the option of traversing multiple paths is disallowed, then it’s basically a tree again. Back to the map, odom and base_link frames in
a TF tree: Wouldn’t it then make more sense to make both map and odom childs of base_link? I think that the map->odom transform is unintuitive, since usually the odom->base_link and map->base_link are estimated independently.

If map is a child of base_link. How would multiple robots drive around in the same map?

You’re right. That would make it hard to use multiple robots in the same map. So does that explain why the TF tree according to REP105 is map->odom->base_link? Would it be a good idea to clarify this rationale a bit more in REP105? All it currently says
is: “Although intuition would say that both map and
odom should be attached to base_link, this is not allowed because each frame can only have one parent.”