[Request for Input] Potential Existing SLAM Toolbox Serialized File Invalidation

Hi all,

Its your friendly neighborhood navigator here. I wanted to bring to you attention a handful of PRs and a discussion that’s been going on for the better part of a year regarding 360 lidar & non-axially-aligned lidar support on SLAM Toolbox. Some of you have been able to use SLAM Toolbox in these conditions without a problem, but others have reported some behaviors we’ve been working on solving.

We’ve found a number of issues that were blocking us from supporting this and resolved the vast majority of them. Alas, one remains that we have a workable solution for which will forever put this issue to bed.

It looks like there was a bug in the original openKarto code that no one caught resulting in a mix-up between the robot base pose and the laser pose when interacting with the optimization engines. This was the root cause of these issues. In fixing this, we allow for any lidar mounted in any position and resolve the issues RPlidar users were reporting (and fixes the largest standing bug stopping multi-laser support!). Overall, is the correct course of action.

However, this fix comes at a price. Because we’re messing at the lowest levels of openKarto, we’re actually changing the information embedded in the serialized files. That means that any files you’ve serialized prior to this PR may no longer work as expected after this change. The serialized pose-graph information given to the optimizer will now be w.r.t. robot base frame where previously it was stored as laser frame. When you try to re-run your pose-graph over it, the results could be wacky since the new information being added in a localization mode or continued-mapping mode will be in another reference frame.

This is a non-issue if you have a lidar mounted with the primary axes aligned to your robot base frame (e.x. laser mounted forward; either right side up or up side down). This will only be an issue for anyone mounted at an angle or backwards, but also solves a long standing issue impacting those same people.

I’m here to ask if breaking some old serialized files mid-distribution would be OK to you considering the benefits it provides, especially considering it only breaks behavior of people already with broken behavior they need fixed. Note that this does not in any way impact saved png, pgm, etc occupancy map files saved, only the serialized .posegraph and .data files SLAM Toolbox creates on request.

For any new distributions, this PR will be merged so that new users don’t run into any of these serialization issues - so your files will become invalidated eventually anyway. So we’re really just talking about released distros of melodic noetic foxy and dashing. This is an ABI breaking change I would typically not consider merging, however since ROS(1) is “done” and will still have users for many years, it seems like shooting ourselves in the foot to not fix this issue for everyone else that comes along for the next 4 years.

What I’m leaning towards, since this is a big fix and a big break at the same time is to create distro_name-devel-unfixed branches for the released distributions which will branch off before this change so that all existing users have access to the existing behavior if they want it. Then we’d merge these changes and re-release with the patch in the next distro sync. We’d add warnings to the readme files as well letting people know this occurred.

Please let me know what you think.


Ref: 2 Major issues on the topic

Ref: The PRs to resolve for each distribution

5 Likes

As a note, we have no intention of making any other serialized file format / content changes in the foreseeable future, so this is probably a one off.

Sorry if I am misunderstanding it. But does this means that the past saved files will not work, but going forward there are no other issues?

Would this affect any front end when interacting with the map saving and loading?

Yes. Though its not that all past saved files will not work, its that some won’t.

No change to any of this. Its just what frame data is being serialized in for the optimizer graph.

It doesn’t sound like there’s alot of thoughts here… unless there are strong objections to my proposal, I’ll plan on executing it this week. 4 people have “hearted” this as having read it, I take that as approval since they raised no concerns.

Steve

Is there a way to store some new information in files generated by the fixed toolbox and then automatically decide whether the old or new behavior should be used? That would provide 100% backwards compatibility. And there could be e.g. a flag or something that would force the toolbox to use the new behavior even on old files.

It’s changing just information in a couple of vectors into another frame so its not decipherable if its “old” or “new”. The old is objectively wrong which was what was causing issues for non-axially mounted or 360 lidars.

I’d need to refamiliarize myself with boost serialization to know if we can tack on a new object with some metadata that we could use to pull out if new. The problem is since it wouldn’t exist in the old files at all, I’m not sure off of the top of my head if boost serialization will let me cleanly check for this and do something else instead. Even if it could, propagating that state through the SLAM Toolbox pipeline could become a real undertaking in it of itself.

I’m not going to say that this would be impossible to accomplish, but it is a bit more of an undertaking than I can commit to working out myself at this time due to purely time constraints.

Another option that would be probably much easier would be to have a bool to set whether you want old or new behavior, but it would default be off so that people got the fix. This feels almost to the point where just maintaining a parallel branch for the unfixed version would better for users though.

It’s understandable you don’t have time for the theoretically best solution. If the situation is such that people with tilted lidars had never slam toolbox working, then it’s probably okay to go with the breaking change. Keeping the unfixed branch sounds like a reasonable solution.

Awesome: As a final resolution this is what we’re doing

  • Maintaining separate unfixed branches for all released distributions
  • Adding to readme a link to this discussion and the resolution
  • Merging and releasing updates of all other distributions for this patch

The new readme section reads

As of 03/23/2021, the contents of the serialized files has changed. For all new users after this date, this regard this section it does not impact you.

If you have previously existing serialized files (e.g. not pgm maps, but .posegraph serialized slam sessions), after this date, you may need to take some action to maintain current features. Unfortunately, an ABI breaking change was required to be made in order to fix a very large bug affecting any 360 or non-axially-mounted LIDAR system.

This Discourse post highlights the issues. The frame storing the scan data for the optimizer was incorrect leading to explosions or flipping of maps for 360 and non-axially-aligned robots when using conservative loss functions. This change permanently fixes this issue, however it changes the frame of reference that this data is stored and serialized in. If your system as a non-360 lidar and it is mounted with its frame aligned with the robot base frame, you're unlikely to notice a problem and can disregard this statement. For all others noticing issues, you have the following options:

    Use the <distro>-devel-unfixed branch rather than <distro>-devel, which contains the unfixed version of this distribution's release which will be maintained in parallel to the main branches to have an option to continue with your working solution
    Convert your serialized files into the new reference frame with an offline utility
    Take the raw data and rerun the SLAM sessions to get a new serialized file with the right content

More of the conversation can be seen on tickets #198 and #281. I apologize for the inconvenience, however this solves a very large bug that was impacting a large number of users. I've worked hard to make sure there's a viable path forward for everyone.