Fast Forward Merging rosbag2 master API to Foxy

Hello friends of rosbag2.

Over the last couple of months, we’ve been working hard to bring rosbag2 to a level which significantly improves the write performance and made it easier to work with compression.
Together with the help of Robotec.AI we have been focusing on improving the storage backend to deal with high volume data traffic, allowing us to record multiple hundreds of megabytes per second with our default sqlite3 storage backend. Before the improvements, users were experiencing significant message loss even with throughputs of 100 Mb/s or less, especially when recording many topics, and we have received feedback that for some of them, this was a serious limitation for their use cases. With our optimizations, data is recorded as expected up to the hardware limitation of disk write speed, and rosbag2 seems to be on par with rosbag (ROS 1). For more details about the improvements please find a full report on Github. You can also refer to an initial metaticket of performance evaluation to understand the motivation behind the improvements as well as analysis of Foxy rosbag2 bottlenecks.
Similarly, Amazon put some effort into improving the performance of working with compressed rosbags. That is, we’ve introduced a streaming interface which allows reading the compressed bagfile without loading it as a whole beforehand into memory. We’ve also implemented multithreaded compression which brings an improved performance when writing bagfiles.

While these improvements work great on the current master branch, as well as the rolling release, we’d love to provide the improved functionality to most of you as a backport to Foxy.
Unfortunately, these performance improvements come with a price. And that price is a breaking ABI/API compatibility with the currently released Foxy version. The changes are affecting the public API in rosbag2_cpp, rosbag2_storage as well as the sqlite3 storage plugin, however will not break with the overall usage of the rosbag2 command line interface (though there may be some new arguments available).

As a basic rule of thumb, it holds that during the lifespan of a ROS distro, released packages maintain ABI compatibility. While we generally consistently aim for this, we are hereby asking your permission to exceptionally break with this rule.
We consider the performance improvements important enough to justify an API breakage within the same distro release. The two main reasons for it are that Foxy has the largest user base and thus most people will profit from a binary release. Second, we believe that rosbag2 (and its derivative packages such as rosbag2_storage, rosbag2_cpp, …) are high enough in the stack to be considered leaf packages with the hope that not too many applications are compiling/linking against its API. Obviously, we are trying to minimize the changes required by client code.
We are thus considering fast forwarding the current API as available on the master branch to Foxy in a one-time shot PR, adapting for all changes to dependencies such as rcutils, rcpputils, etc. We’ve opened a first draft of this PR which shows promising results, having only a few minor adaptations.

We’d love to hear your feedback on this before we take action. We’re internally leaning towards the API break, but if you - the community - have strong arguments why an API change is not an option, we’re all ears. As mentioned above, we will respect ABI compatibility for future enhancements for rosbag2 and consider this an exception.

Respectfully,

your ros-tooling team.

17 Likes

I strongly support this action. Anything that accelerates rosbag2 along to maturity I think is worthwhile.

4 Likes

If we want people to build on a product, it needs to be stable. In my opinion, that is more important than features trickling back to a stable release.

This feels like a bit of a false point. If that’s true, why have a public API at all? I don’t mean that rhetorically-- you made the decision in the past to have a public API. Should that decision be revisited so that you have the freedom to make these kinds of changes if you truly believe rosbag is a leaf application?

2 Likes

While I generally agree with this - i think we’ve reached a point where rosbag2 is considered unusable in Foxy due to performance concerns, and that fact is being considered a blocker for migrating to ROS 2 at all for some projects, preventing expanded adoption. I agree that new features or usability tweaks wouldn’t be worth talking about.

From my perspective, the point we’re trying to make is that rosbag2 is a leaf package in effect in Foxy, if not in intent, because the public APIs don’t seem to be used. I’m thinking of the following observations - which are by no means definitive, but are compelling enough to warrant discussion

  • there are no community-contributed rosbag2 plugins that have been bloom released yet in any ROS 2 distribution. The storage, type conversion, and compression plugins are enabled via a public API
  • rqt_bag is not released into Foxy, which is the first point of external tooling we would expect to see using the rosbag2 C++ API

This suggests to me that rosbag2 hadn’t reached a point of maturity by the time of the Foxy release that anybody has been willing to use it beyond the CLI. I would love to see counterexamples, though!

2 Likes

To balance that a bit, the issues with rosbag2 right now are so great that you couldn’t build a product (or even research) around it. It’s been a reoccurring issues in Nav2 / SLAM Toolbox that rosbag2 is blocking data sharing in ROS2 because of performance issues and reliability problems. I’d usually agree with that argument, but in this case, I think its degraded to a point that these aren’t modest improvements, they’re making it usable from being unusable.

2 Likes

These are fair points. However, it’s worth pointing out that they are the same points that anyone in the history of breakage has relied upon. They can stem from being too close to the project in question-- “I consider my project completely broken without Y”-- when in reality the average user might think it’s perfectly fine. I’m not saying that’s the case here, I just urge caution. Hearing from rosbag2 users here will be useful, but it’s also biased-- many users are not on this forum at all.

Your API will always not be used until it is. You probably won’t know when that time is. Breaking it until you feel like someone’s relying on it is a scream test, and will, well, make people scream :stuck_out_tongue: . If you want the freedom to break your API, take it. Just make it private until you consider it stable enough to not break it. My two cents.

3 Likes

Also a very good point. I’m just a single datapoint, though I’m not close to the project. I just really want it to work well enough to use to setup Nav2/SLAM/perception CI systems and collect data reliably for research experiments :slight_smile:

1 Like

I know it sounds like a lot of work, but what about renaming rosbag2 and related packages to something else for Foxy? That would bring the new features to those who want them but keep the promise of stable API/ABI :slight_smile:

2 Likes

I was one of the people that initiated this because in the course of the AVP 2020 we realized that rosbag2 was simply useless for us (topics missing in the bag, messages not being recorded, the tool crashing) and the only reason that we were able to bring that project to the end was because of the LGSVL simulator and because of the (extremely inconvenient) debugging in the car.

We also hear the same complaints from the industry players: How to use RTI Recording Service with ROS2 | by Nickolai Belakovski | Medium.

So because rosbag2 is not working this is what happened:

  1. our customers built their own record/replay tools (which means that they are not building on the rosbag2)
  2. we patched rosbag2 in our internal fork of ROS 2 to the point that you could almost say that it is a different product (which means again that we are not building on the rosbag2)

So while I agree with building on top of the stability rather than the latest and greatest - rosbag2 is the one exception in ROS 2 that should’ve been addressed ASAP.

We have however created this post to hear from the average users and to gauge the impact of making an API/ABI breakage. The things that we would like to hear are “I am using rosbag2 API in such and such way” and then we can see if upgrading the user will be easy or not.

3 Likes

Awsome! As a user of the rosbag public API, I’d like to throw in my hat for this as well. Last year I worked on developing a rosbag plugin to render recorded bagfiles immutable or append only via DLTs.

You can view the ROSCon talk, and the WIP package repo here:

During my experimentation, I encountered some practical performance issues (using ROS2 Dashing at the time), and so put further development on the backburner until the underlying write bottleneck was resolved. I briefly explored porting to ROS2 Foxy, but didn’t see much of an improvement and so stalled again. Not having to wait until the next LTS to justify resuming development with end user adoption would be welcomed by me.

I’d also like to ask if rosbag2_storage_plugins is any more reusable as a base plugin now?

5 Likes

1st of all, great work :+1: we will be catching up with this :slightly_smiling_face:

but about the backport, i am so negative on this…

If we want people to build on a product, it needs to be stable. In my opinion, that is more important than features trickling back to a stable release.

agree, that is a really basic policy. THAT GUARANTEES WE DO NOT BREAK THE USERSPACE.

if data corruption or security issue to be addressed, we end up changing those (as less as possible), but not for a feature. i think i do even the opposite, like we degrade the performance to keep the userspace.

This feels like a bit of a false point. If that’s true, why have a public API at all? I don’t mean that rhetorically-- you made the decision in the past to have a public API. Should that decision be revisited so that you have the freedom to make these kinds of changes if you truly believe rosbag is a leaf application?

agree, if we do this, the same thing will happen and break the userspace again and again.

i think we’ve reached a point where rosbag2 is considered unusable in Foxy due to performance concerns

that is exactly why we came up with leveldb

where we are standing is,

  • rosbag2 / sqlite3 backend is not really fast enough for some use cases.
  • release distro keeps the ABI/API for release. which is really okay for the application.
  • we may as well create backend to support the requirement that we have. (luckily there is interface!)
  • then we did create backend implementation! which works with our use cases.
  • mainline changes their mind all the sudden :frowning_face: that is gonna break our implementation… :sob:
  • what should we do??? how can we trust the interfaces???

to be clear, i am not mentioning leveldb specifically. there will be other approaches and implementation which are not public or vendor controlled internally. if we change the interfaces, we literally break the promises.

if we change the interfaces in foxy, could anyone explain what we should’ve done really? no pressure, but i really do not know…

This suggests to me that rosbag2 hadn’t reached a point of maturity by the time of the Foxy release that anybody has been willing to use it beyond the CLI. I would love to see counterexamples, though!

above is my opinion. once we release the public API, i think there is no way to know who uses or relies on that if we think about application as open source perspective.

I’d also like to ask if rosbag2_storage_plugins is any more reusable as a base plugin now?

Good point, +1 on this

btw, i’d like to know if this kind of discussion about breaking API/ABI in release happened before?

thanks :slightly_smiling_face:

3 Likes

My experience was that Foxy bags were not suitable for our case. We needed to record data from a sensor-equipped construction vehicle to achieve two main goals:

  • evaluate new (solid-state) lidars in common tasks of perception
  • reconstruct real environment in the simulation

Unfortunately, we ran into multiple issues, ranging from stability, through missing topics, to having only a fraction of messages in the bag. In the end, we had to use rosbag (ROS1), which was an inconvenience considering that we developed ros2 wrappers for sensors and evaluation benchmarks.

I am certainly biased since I worked on the improvements. However, I believe that by fast forward merge we could give Foxy users the first real version of what rosbag2 is supposed to be.

3 Likes

Hi @tomoyafujita

agree, that is a really basic policy. THAT GUARANTEES WE DO NOT BREAK THE USERSPACE.

As Karsten said above, the plan was not to break the command line usage and I have asked again here to list the APIs/ABIs that would break with this proposal.

Secondly please read my reply above Fast Forward Merging rosbag2 master API to Foxy - #9 by Dejan_Pangercic. rosbag2 as of today can not be used. Many users have moved on to create their own solutions for record and replay or used rosbag - how is this better than breaking the API/ABI and to help the users to update?

if data corruption or security issue to be addressed, we end up changing those (as less as possible), but not for a feature.

This proposal is not about the new features but about the improvements. The improvements necessitated the change of APIs/ABIs.

That is exactly why we came up with leveldb

where we are standing is,

  • rosbag2 / sqlite3 backend is not really fast enough for some use cases.

Lets please not add to the scope of of this post. leveldb does not solve all of the issues that the rosbag2 had, especially on the transport level. Also please note that Robotec.AI’s report concludes that the sqlite3 performance is now actually slightly better than leveldb.

above is my opinion. once we release the public API, i think there is no way to know who uses or relies on that if we think about application as open source perspective.

All of this can be solved easily in my POV with a brief and concise porting guide, which @karsten @emersonknapp I think that we should provide if this proposal will be accepted.

btw, i’d like to know if this kind of discussion about breaking API/ABI in release happened before?

Are you asking if this ever happened in the ROS world before or in rosbag2? The answer to the latter is certainly no and I vaguely recall rare situations from the former (e.g. indigo ABI break in roscpp after update? - ROS Answers: Open Source Q&A Forum).

3 Likes

The current state of rosbag2 could be considered a bug (functionality is not working as intended) which is largely fixed by the contributions described in the OP by @karsten.

While it’s unfortunate that a fix for this has not been implemented in a fully bw-compatible way, that’s sometimes the case with bug fixes, and is sometimes worth it.

Personally, I’m in favour of breaking API in this case, as I’d rather avoid divergence and fragmentation in ROS 2 (as @Dejan_Pangercic describes has essentially already happened) than rigidly applying rules.

3 Likes

API breaks happened before, e.g. Split of joint_state_publisher and joint_state_publisher_gui and the corresponding PR Split jsp and jsp gui by clalancette · Pull Request #31 · ros/joint_state_publisher · GitHub . It confused the user base, but in the end it somehow went through…

1 Like

I support the backport.

The reason is simple: We have already switched to the rosbag2 version from rolling, even though we remain on foxy for the rest of our system. This is because the version currently in foxy not only has performance issues, but is also lacking many important features in the rosbag2 Python API.

5 Likes

thanks all for sharing requirements and opinions :+1:

the plan was not to break the command line usage

i did not mean only CLI but all interfaces exposed to user application.

Many users have moved on to create their own solutions for record and replay or used rosbag - how is this better than breaking the API/ABI and to help the users to update?

i was wondering who Many users here. i was thinking that we cannot exactly know who actually uses…we eventually break someone’s CI or build system with unexpected modification.

leveldb does not solve all of the issues that the rosbag2 had, especially on the transport level.

we always have (will have) issues and improvement, i wasn’t clear on why and how come this improvement is an exception. at least we should have logics to make that decision, unless it would be easy to break the userspace again.

Also please note that Robotec.AI’s report 1 concludes that the sqlite3 performance is now actually slightly better than leveldb.

yes, i am aware of that. which is good thing for everyone. probably we don’t have to maintain the leveldb implementation anymore! (actually we’ve been considering details and scenarios including aarch64 platform) btw, we are not pushing leveldb at all, we just want to have something works for us.

The answer to the latter is certainly no and I vaguely recall rare situations from the former (e.g. indigo ABI break in roscpp after update? - ROS Answers: Open Source Q&A Forum).

API breaks happened before, e.g. Split of joint_state_publisher and joint_state_publisher_gui and the corresponding PR Split jsp and jsp gui by clalancette · Pull Request #31 · ros/joint_state_publisher · GitHub . It confused the user base, but in the end it somehow went through…

appreciate for pointing these out! so after all it happened before. and according to <no title> and https://semver.org/ we need to bump the major version to break the API. that is said it is no hard guarantee? please correct me if i am wrong, this is really important as in architecture and entire build system.

thanks in advance :smiley:

1 Like

Yes, we have had API and ABI breaks before. But as @Dejan_Pangercic says, it is a pretty rare situation and the problem that gets fixed has to be very severe.

That change was meant to be transparent, but I messed up. However, in my defense, nobody tested it out in the testing repository or ever reported a bug until several months later. (anyway, I don’t want to hijack this thread for this)

For the current situation with rosbag2, I’m still on the fence about it. I agree with @Dejan_Pangercic that rosbag2 in Foxy largely doesn’t meet the needs of users. But it does also feel like breaking the API/ABI here is breaking our promise to users. As @kyrofa says; this thread will reach some users, but many of them will not see it at all. So it is hard to determine what the scale of the breakage will be.

3 Likes

I will note that rosbag2 has not yet declared version 1.0, nor has it declared a REP-2004 quality level for Foxy (or yet on master) - so according to the ROS and semver guidelines we won’t have broken any contract and would only need to bump the minor version. I’d like to declare 1.0 and a quality level for Galactic which I think would create a hard exclusion for doing this ever again.

Just making an argument from the “letter of the law” :slight_smile: - it’s not a final argument.

4 Likes

I see parallels with the thread over here about whether people should be using the most recent distributions or sticking with older, more stable stuff.

FYI, I’m a new ROS user and have been evaluating it for use in a product I’m working on. We’re using the NVIDIA Jetson which only has Ubuntu 18, so I’m stuck on Eloquent. Recently I tried using rosbag2 but the ros-eloquent-ros2bag version doesn’t even implement --loop??

This was really the final straw and I’ve decided to give up on ROS2 and try ROS1 instead, or potentially skip ROS altogether. Releasing incomplete software under official-sounding names like rosbag2 is a great way to make people like me lose trust in the ecosystem.

This seems like a very reasonable solution to the current tension here.

1 Like