The current Autoware (v1.11) depends only on LiDAR (ndt_matching). Other inputs such as GNSS, CAN, and IMU are used to guess initial search position in ndt_matching algorithm. It is difficult to scale up scenarios which Autoware can drive.
The new architecture accepts multiple configurations of sensors and localization algorithms by using sensor fusion such as EKF (Extended Kalman Filter) and PF (Particle Filter).
The minimum configuration is the same as the current Autoware. Localization shall work only with LiDAR. The other sensors and vehicle data are optional.
Descriptions
This proposal is a starting point of the discussion. Feel free to comment
In the interests of keeping the information in one place and easy to access, here is the content of that Google Docs file.
Background
The current Autoware (v1.11) depends only on LiDAR (ndt_matching). Other inputs such as GNSS, CAN, and IMU are used to guess initial search position in ndt_matching algorithm. It is difficult to scale up scenarios which Autoware can drive.
The new architecture accepts multiple configurations of sensors and localization algorithms by using sensor fusion such as EKF (Extended Kalman Filter) and PF (Particle Filter).
The minimum configuration is the same as the current Autoware. Localization shall work only with LiDAR. The other sensors and vehicle data are optional.
Descriptions
This figure is in progress and will be updated based on discussions.
Inputs of Localization
Localization inputs sensor data. The sensor data might be pre-processed before localization. The detail has not defined in the figure yet. For example, concatenating pointcloud might be outside of localization.
Imu and can(vehicle_status) is both optional when twist is calculated from series of /localization/pose.
One of pose_estimators (sensors) is mandatory. The others are optional. The accuracy of localization depends on which sensors and algorithms are selected.
Outputs of Localization
Localization outputs pose and twist. Localization should output them in at least 100Hz for accurate control. The output can be in 100Hz by e.g. linear interpolation even if sensors output data less than 100Hz.
Inputs of Fusion
Pose
Shall input one of the pose_estimators’ outputs.
Shall be aligned to world frame.
Has covariance for fusion.
Twist
Shall be aligned to base_link frame.
Imu and can(vehicle_status) is both optional when twist is calculated from series of poses.
# Abstract
The new architecture accepts multiple configurations of sensors and localization algorithms by using sensor fusion such as EKF (Extended Kalman Filter) and PF (Particle Filter).
- The minimum configuration is the same as the current Autoware. Localization shall work only with LiDAR. The other sensors and vehicle data are optional.
+ The minimum configuration is the same as the current Autoware. Localization shall work only with one sensor, e.g. LiDAR. The other sensors and vehicle data are optional.
Inputs of Fusion
* Pose
- Shall input one of the pose_estimators’ outputs.
- Shall be aligned to world frame.
- Has covariance for fusion.
* Twist
- Shall be aligned to base_link frame.
- Imu and can(vehicle_status) are both optional when twist is calculated from series of poses.
+ - Has covariance for fusion.
Hi There! Glad to see an EKF is in the works, it will improve autoware dramatically!
If we are going to re-vamp the localization engine, now is an excellent time to properly conform to REP-105.
Specifically, I strongly suggest using the earth->map->odom->base_link tf tree.
I could list some of the benefits for it, but they are already documented in the REP, so please give it a good read.
Here is how I suggest we transition to it:
re-name “world” frame to “earth” and move it to the center of the earth (as per ECEF).
map frame stays the same, except that we try to enforce ENU (only when geospatial data is used though). People can still use Japan coords, utm, whatever, for the origin of their map frame.
The map -> odom tf is just an identity transform (no change), and we use the odom->base_link tf the same as we currently use the map->base_link tf. Later, when we properly integrate all the various localization sources (unless that is what the above is planning to do? maybe we do it now?), the map -> odom -> base_link chain can be updated as described in the REP.
Also, in the above, can I propose we rename “twist_generator” to “vehicle_odometry” or something more descriptive of what function it is actually performing?
Let me know if there is any way I can help roll out these suggestions.
Thanks!
I agree descriptive name is definitely necessary as you mentioned.
How do you think about naming convention? Original names have
an unified convention, e.g. estimator, generator, localizer, etc.
@Ian_Colwell I think odom frame is unnecessary for Autoware.
Specifically, I strongly suggest using the earth->map->odom->base_link tf tree.
ROS nodes which require odom frame such as gmapping,amcl,robot_pose_ekf etc…, almost all of them cannot use with Autoware.
So, I think Autoware shold not follow all rules of REP 105.
If we follow all rules, Autoware architecture becomes too complex.
Also, in the above, can I propose we rename “twist_generator” to “vehicle_odometry” or something more descriptive of what function it is actually performing?
twist_generator only publish twist message, it does not publish pose.
If we rename this to “vehicle_odometry”, it is misleading.
I disagree that following REP 105 would make autoware too complex. As msmcconnell said in the linked github issue “This will allow other ros users to easily understand our frames (map, base_link, etc.)”.
I think it would make autoware less complex because then users don’t have to learn a custom frame standard for autoware.
I see your point about the twist_generator name. Looking at the diagrams again, I have some more thoughts:
Is the IMU data only being used for angular velocities? (since it must fit into a twist message). I would suggest feeding the IMU data directly into the EKF since it contains state information that would otherwise be thrown out (linear acceleration info).
What does the twist_generator actually do? does it only combine two message together, or peform some kind of filtering or processing? Maybe we just get rid of twist_generator completely and have the EKF subscribe directly to vehicle state/status and IMU data?
I think it would help if we fully defined the state matrix that the EKF is trying to manage, it might help us plan out where the information is coming from. This is off the top of my head:
At first blush the odom frame can seem to be unnecessary. However it’s very important due to the potential for discontinuities in localization estimates.
The inspiration to have an inertially consistent coordinate frame came from my experiences in autonomous cars in the early Darpa Grand Challenge days. For safety purposes cars must remain operational with full navigation even if global localization fails. There are innumerable reasons for failures they can range from foreseeable ones like driving through a tunnel, to the periodic alignment of all visible GPS satellites in a single line that prevents triangulation. Even if you’re doing localization based on ground features, it’s quite possible that you’ll encounter a construction zone or other new or modified environment where you don’t have enough premapped features to track. This is fine for a while if you’re localization goes forward temporarily in dead reckoning mode everything will degrade slowly at the limit of your odometric estimation. The challenge comes in when your localization system locks in again after you’ve been dead reckoning. You now find out that you’re off by 1m globally. So you adjust your position estimate in the world by 1m. But what about that vehicle that you observed in the oncoming lane, is it now 1m to the left too? If you’re tracking it’s velocity and estimating where it’s going, it looks like it just make a 90 degree turn sideways at high velocity. Because based on the observed global positions it just did. Even thought the laser saw it 10cm closer in the oncoming lane and your IMU told you that you moved forward 10cm since the last update so you can estimate that it’s actually stopped in the odom frame. Suddenly you cannot distinguish between a stopped car and one in the process of cutting you off and you slam on your brakes to avoid a stopped car in the other lane.
In general localization provides you with the best estimate of your current position and it’s solution is not expected to be continuous. The jitter in the localization directly translates to blurring of your mapping if you track your objects in the localized frame. The level of jitter from odometry systems is usually much smaller and as such you can operate with much smaller tolerances. From the PR2 use cases this is the difference between being able to drive through a doorway with a few cm clearance on each side while having a few cm localization noise. This same sort of scenario could play out when a car is looking to drive through a narrow tunnel exit. As the localization fix improves the walls will suddenly get blurred and the car will likely consider the road too narrow to drive down.
There’s also a significant benefit to following the standard as well for general interoperability. It is true that you are unlikely to leverage those specific nodes that implement this standard. But the reason that they all implement the standard is because it’s important to the functionality of the system. And looking forward there may be other systems implemented which you may want to leverage from other projects and if you follow the standard and they follow the standard you will be compatible in the future.
In general any time you’re facing a decision and there’s a common standard I would strongly recommend that the default be to use the standard and a strong argument must be made as to why to diverge from the standard. “Because it’s too complicated” is not a strong argument. A strong argument would cover, this is what they’re doing, this is why they’re doing it, this is why it’s not applicable to our use case, and this what we should do instead which will cover all of our use cases.
If I recall correctly, a team entry during the DARPA Urban Challenge 2007 Qualifying Event lost GPS while in motion and subsequently drove into a K-rail given the failure in local pose estimation:
After assessing the log data we determined the cause of the accident was a failure in the communication link between our GPS/IMU and the main control computer. Without “pose” information the robot could not know that it was moving
…
The issue is that the pose estimator is essential to control; without knowing its pose with respect to the wall, the robot can’t know what to do. In fact, looking at the data from the robot’s point of view, the wall hit it. Saturday PM: Sting Suffers Setback | Sting Racing - Team 50
Thanks @ruffsl, failure analysis can be invaluable. Here’s the one that I’m most familiar with which is also a well documented localization issue. And as a bonus it has a video that captures the moment.
Here’s the spectator’s view
Here’s the robots eye view, the same time slot is about 3:50 into the video.
There are relevant section in the 7.2 National Qualifying Event section as well as 7.3 Grand Challenge Event and 8 Lessons Learned and Future Work.
Clearly over a decade later localization systems are is more accurate and have higher reliability levels, but there can still be failures and the system needs to be remain robust. And global localization is still one of the most externally dependent systems in an autonomous car relying on external data sources in real time that can be interrupted (ala GPS0 or externally obtained datasets that may be incomplete or out of date.
I’m in favour of this. Unless anyone has a strong argument against doing so, I think we should try to move in this direction. The more ROS norms we conform to, the easier it will be for others to contribute.
That’s why @Ian_Colwell said to make it an identity transform. This keeps us compatible with ROS norms while not having any impact on the actual transforms or on the complexity of the task frame tree. It is quite common for a robot to have a particular transform be identity so it can be compatible with the established norms.
That is not a good thing. It is not something to be happy about and it is not something to aim for.
Remember, the best practices developed in ROS are not just for compatibility. They are also because thousands of people have spent more than 10 years figuring out how to make robots work well, and these are the solutions they have come up with.
You give no justification for this blanket statement.
As we have discussed previously, it’s also not generating anything. It is calculating the offset in space (linear and rotational) from the previous frame of data. So the twist_generator name is also misleading.
This is why Autoware.Auto must use a more defined process that starts from requirements. Designing based on “that feels right” won’t get us very far!
I think odometry is a optional data.
If we make “odom” frame, we have to publish odometry data.
This is because if the vehicle have no sensors such as odometers, IMUs, encoders…, these robot cannot use with Autoware.
That is incorrect. All that needs to be published is a static transform that places in the tree the frame that does not move with respect to its parent (in this case, the map->odom link). After that it doesn’t need to be published at all but the link will still be there.
E.g. gps_localizer estimates the pose between the map and gps frame but in order to work around the TF tree, it needs to publish the transform between the map and the odom frame. I am actually still not sure if this is just a limitation around the fact that TF is a tree (as oppose to e.g. graph) or is this really the most elegant solution. But is has been used in ROS for years, e.g. amcl - ROS Wiki.
In any case I do agree with all here that we need to use an odom frame and stick to the REP 105.
I think that your state proposal has orientation and angular velocity which CATR does not encode directly.
We are now also revisiting this and trying to figure out if this will be enough for the localization use case as well. If not should potentially think about UKF, IMM and a better motion model.