ROS Resources: Documentation | Support | Discussion Forum | Service Status | Q&A

Localization Architecture


I’ve added two modifications for clarification.

# Abstract

The new architecture accepts multiple configurations of sensors and localization algorithms by using sensor fusion such as EKF (Extended Kalman Filter) and PF (Particle Filter).

- The minimum configuration is the same as the current Autoware. Localization shall work only with LiDAR. The other sensors and vehicle data are optional.
+ The minimum configuration is the same as the current Autoware. Localization shall work only with one sensor, e.g. LiDAR. The other sensors and vehicle data are optional.
Inputs of Fusion
  * Pose
    - Shall input one of the pose_estimators’ outputs.
    - Shall be aligned to world frame.
    - Has covariance for fusion.
  * Twist
    - Shall be aligned to base_link frame.
    - Imu and can(vehicle_status) are both optional when twist is calculated from series of poses.
+   - Has covariance for fusion.


Hi There! Glad to see an EKF is in the works, it will improve autoware dramatically!
If we are going to re-vamp the localization engine, now is an excellent time to properly conform to REP-105.
Specifically, I strongly suggest using the earth->map->odom->base_link tf tree.
I could list some of the benefits for it, but they are already documented in the REP, so please give it a good read.
Here is how I suggest we transition to it:

  • re-name “world” frame to “earth” and move it to the center of the earth (as per ECEF).
  • map frame stays the same, except that we try to enforce ENU (only when geospatial data is used though). People can still use Japan coords, utm, whatever, for the origin of their map frame.
  • The map -> odom tf is just an identity transform (no change), and we use the odom->base_link tf the same as we currently use the map->base_link tf. Later, when we properly integrate all the various localization sources (unless that is what the above is planning to do? maybe we do it now?), the map -> odom -> base_link chain can be updated as described in the REP.

Also, in the above, can I propose we rename “twist_generator” to “vehicle_odometry” or something more descriptive of what function it is actually performing?

Let me know if there is any way I can help roll out these suggestions.

1 Like


@Ian_Colwell Thank you for the great comment!

  • Regarding TF, @hakuturu583 also started discussion about frame.
    It seems another big topic. Should we separate the discussion?

  • I agree descriptive name is definitely necessary as you mentioned.
    How do you think about naming convention? Original names have
    an unified convention, e.g. estimator, generator, localizer, etc.



@Ian_Colwell I think odom frame is unnecessary for Autoware.

Specifically, I strongly suggest using the earth->map->odom->base_link tf tree.

ROS nodes which require odom frame such as gmapping,amcl,robot_pose_ekf etc…, almost all of them cannot use with Autoware.
So, I think Autoware shold not follow all rules of REP 105.
If we follow all rules, Autoware architecture becomes too complex.

Also, in the above, can I propose we rename “twist_generator” to “vehicle_odometry” or something more descriptive of what function it is actually performing?

twist_generator only publish twist message, it does not publish pose.
If we rename this to “vehicle_odometry”, it is misleading.



He is talking about frames for Localization, so I think it is no problem to talk about this issue in this channel.

1 Like


I disagree that following REP 105 would make autoware too complex. As msmcconnell said in the linked github issue “This will allow other ros users to easily understand our frames (map, base_link, etc.)”.
I think it would make autoware less complex because then users don’t have to learn a custom frame standard for autoware.

I see your point about the twist_generator name. Looking at the diagrams again, I have some more thoughts:

  1. Is the IMU data only being used for angular velocities? (since it must fit into a twist message). I would suggest feeding the IMU data directly into the EKF since it contains state information that would otherwise be thrown out (linear acceleration info).
  2. What does the twist_generator actually do? does it only combine two message together, or peform some kind of filtering or processing? Maybe we just get rid of twist_generator completely and have the EKF subscribe directly to vehicle state/status and IMU data?

I think it would help if we fully defined the state matrix that the EKF is trying to manage, it might help us plan out where the information is coming from. This is off the top of my head:

  • Position (x, y, z) (map frame)
  • Orientation (roll, pitch, yaw) (map frame)
  • Linear velocity (x) (base_link frame)
  • Angular velocity (yaw) (base_link frame)
  • linear acceleration (x) (base_link frame)

What do you think about those states?



At first blush the odom frame can seem to be unnecessary. However it’s very important due to the potential for discontinuities in localization estimates.

The inspiration to have an inertially consistent coordinate frame came from my experiences in autonomous cars in the early Darpa Grand Challenge days. For safety purposes cars must remain operational with full navigation even if global localization fails. There are innumerable reasons for failures they can range from foreseeable ones like driving through a tunnel, to the periodic alignment of all visible GPS satellites in a single line that prevents triangulation. Even if you’re doing localization based on ground features, it’s quite possible that you’ll encounter a construction zone or other new or modified environment where you don’t have enough premapped features to track. This is fine for a while if you’re localization goes forward temporarily in dead reckoning mode everything will degrade slowly at the limit of your odometric estimation. The challenge comes in when your localization system locks in again after you’ve been dead reckoning. You now find out that you’re off by 1m globally. So you adjust your position estimate in the world by 1m. But what about that vehicle that you observed in the oncoming lane, is it now 1m to the left too? If you’re tracking it’s velocity and estimating where it’s going, it looks like it just make a 90 degree turn sideways at high velocity. Because based on the observed global positions it just did. Even thought the laser saw it 10cm closer in the oncoming lane and your IMU told you that you moved forward 10cm since the last update so you can estimate that it’s actually stopped in the odom frame. Suddenly you cannot distinguish between a stopped car and one in the process of cutting you off and you slam on your brakes to avoid a stopped car in the other lane.

In general localization provides you with the best estimate of your current position and it’s solution is not expected to be continuous. The jitter in the localization directly translates to blurring of your mapping if you track your objects in the localized frame. The level of jitter from odometry systems is usually much smaller and as such you can operate with much smaller tolerances. From the PR2 use cases this is the difference between being able to drive through a doorway with a few cm clearance on each side while having a few cm localization noise. This same sort of scenario could play out when a car is looking to drive through a narrow tunnel exit. As the localization fix improves the walls will suddenly get blurred and the car will likely consider the road too narrow to drive down.

There’s also a significant benefit to following the standard as well for general interoperability. It is true that you are unlikely to leverage those specific nodes that implement this standard. But the reason that they all implement the standard is because it’s important to the functionality of the system. And looking forward there may be other systems implemented which you may want to leverage from other projects and if you follow the standard and they follow the standard you will be compatible in the future.

In general any time you’re facing a decision and there’s a common standard I would strongly recommend that the default be to use the standard and a strong argument must be made as to why to diverge from the standard. “Because it’s too complicated” is not a strong argument. A strong argument would cover, this is what they’re doing, this is why they’re doing it, this is why it’s not applicable to our use case, and this what we should do instead which will cover all of our use cases.



If I recall correctly, a team entry during the DARPA Urban Challenge 2007 Qualifying Event lost GPS while in motion and subsequently drove into a K-rail given the failure in local pose estimation:

After assessing the log data we determined the cause of the accident was a failure in the communication link between our GPS/IMU and the main control computer. Without “pose” information the robot could not know that it was moving

The issue is that the pose estimator is essential to control; without knowing its pose with respect to the wall, the robot can’t know what to do. In fact, looking at the data from the robot’s point of view, the wall hit it.

1 Like


Thanks @ruffsl, failure analysis can be invaluable. Here’s the one that I’m most familiar with which is also a well documented localization issue. And as a bonus it has a video that captures the moment.

Here’s the spectator’s view

Here’s the robots eye view, the same time slot is about 3:50 into the video.

And here’s the paper with an analysis of the crash:

There are relevant section in the 7.2 National Qualifying Event section as well as 7.3 Grand Challenge Event and 8 Lessons Learned and Future Work.

Clearly over a decade later localization systems are is more accurate and have higher reliability levels, but there can still be failures and the system needs to be remain robust. And global localization is still one of the most externally dependent systems in an autonomous car relying on external data sources in real time that can be interrupted (ala GPS0 or externally obtained datasets that may be incomplete or out of date.



I’m in favour of this. Unless anyone has a strong argument against doing so, I think we should try to move in this direction. The more ROS norms we conform to, the easier it will be for others to contribute.

That’s why @Ian_Colwell said to make it an identity transform. This keeps us compatible with ROS norms while not having any impact on the actual transforms or on the complexity of the task frame tree. It is quite common for a robot to have a particular transform be identity so it can be compatible with the established norms.

That is not a good thing. It is not something to be happy about and it is not something to aim for.

Remember, the best practices developed in ROS are not just for compatibility. They are also because thousands of people have spent more than 10 years figuring out how to make robots work well, and these are the solutions they have come up with.

You give no justification for this blanket statement.

As we have discussed previously, it’s also not generating anything. It is calculating the offset in space (linear and rotational) from the previous frame of data. So the twist_generator name is also misleading.

This is why Autoware.Auto must use a more defined process that starts from requirements. Designing based on “that feels right” won’t get us very far!



I think odometry is a optional data.
If we make “odom” frame, we have to publish odometry data.
This is because if the vehicle have no sensors such as odometers, IMUs, encoders…, these robot cannot use with Autoware.



Thank you for very very important point.
I feel odom frame is important in terms of safty.



Sorry, I forget we can publish odometry data from Fusion Localizar.



That is incorrect. All that needs to be published is a static transform that places in the tree the frame that does not move with respect to its parent (in this case, the map->odom link). After that it doesn’t need to be published at all but the link will still be there.



This is a very common pattern in ROS. Remember, “odometry” is an abstraction that is independent from where that data comes from.



I do sympathize with @hakuturu583 a little. At the first glance understanding this TF tree is not intuitive

E.g. gps_localizer estimates the pose between the map and gps frame but in order to work around the TF tree, it needs to publish the transform between the map and the odom frame. I am actually still not sure if this is just a limitation around the fact that TF is a tree (as oppose to e.g. graph) or is this really the most elegant solution. But is has been used in ROS for years, e.g.

In any case I do agree with all here that we need to use an odom frame and stick to the REP 105.



@Ian_Colwell In AutowareAuto we have implemented a rather generic EKF + CATR motion model for mostly object tracking purposes:

  2. (equations are messed up because they were written for doxygen markdown format:

I think that your state proposal has orientation and angular velocity which CATR does not encode directly.

We are now also revisiting this and trying to figure out if this will be enough for the localization use case as well. If not should potentially think about UKF, IMM and a better motion model.

If you have more thoughts on this let us know.

1 Like


Currently autoware does work this way, but really, a generic gps_localizer should produce a Position or Pose (depending on the gps hardware) in the “earth” frame using ECEF coordinates. Then the above mentioned EKF node can take this Pose and update the map->odom->base_link tfs accordingly.

This would also allow more flexibility when using autoware anywhere in the world (currently, I’ve heard stories of people having to fake their GPS data and pretend they are in Japan to use autoware without modification). All the user has to do is provide the earth -> map TF and then you can run autoware anywhere you want. Last week I wrote a quick gps-only localization node (no EKF) to see if I could localize without ndt_matching at all (in case you are in a featureless environment). Autoware is currently operating just fine (still more testing needed though) using the methodology I described thanks to the de-coupling that TF trees provide (if everything operates in the map frame, it doesn’t matter where the map frame is). Hoping to bring some of this work into autoware, but we need to define this EKF node first.

I’m not following you on that, wouldn’t TR (turn-rate) from CATR be the same as angular velocity about z (or yaw)?. In the link you provided, is that just an example of a CATR motion model, or the one actually in use for tracking purposes? The lack of orientation (yaw) surprises me, wouldn’t we want to filter the direction of detected/tracked objects? Or is the orientation just passed through from object detection? I’ve never worked closely on object tracking code, so I may be over-estimating what is possible, I know detected object noise makes it hard to get good state estimations of objects so people tend to opt for simpler EKFs.

This is the first I’ve heard of the CATR acronym, and I’m definitely not a localization expert (it’s been years since I’ve needed to look closely at an EKF), but it sounds similar to what we’ve used in the past which was basically a really simple bicycle motion model with constant acceleration (this was for ego state estimation, not sure what we used for detected object tracking).



My bad @Ian_Colwell.

  1. turn rate is indeed the angular velocity.
  2. this package is indeed an actual CATR motion model implementation
  3. I was wrong regarding the state. In fact we use the exact same state struct as you above: Only that the orientation is represented through heading and not a full 3DOF orientation.

LMK if this clarifies it.




Ah, that makes more sense, all I had previously looked at was the
The state that you linked looks great for tracked objects!

For ego localization, I suggested a 3DOF orientation for a couple reasons:

  • We found pitch was useful in speed controls to determine if the car was on a hill or not.
  • I am unfamiliar with the current ndt_matching, but most lidar localization implementations work better when you have roll and pitch, since small roll or pitch angles can drastically change the lidar scan which is used for matching. I think ndt_matching estimates Pose with full orientation anyway.
  • Novatel SPAN devices (or most GPS-INS systems) provide a full 3DOF orientation, so we might as well use it.

One other difference I noticed is the tracker EKF does not estimate Z position. This is probably fine, the z value could just be passed through from detection. However, for ego localization, the Z position is definitely needed.