In an age long past (10 years ago), @Ingo_Lutkebohle and I did quite a bit of work looking at improving the format used for recording bags in ROS 1. There were three different approaches we looked at.
HDF5R
A format built on using the HDF5 format that has been mentioned so many times. I can’t remember much about what we did, but there’s a partial implementation here. I don’t think we got very far, probably the work done was mainly prototyping and benchmarking. I can’t say why the work was dropped, but I do agree that HDF5, while very widely used and supported, has limitations that reduce its usefulness for us.
Extended ROS 1 bag format
@Ingo_Lutkebohle and his colleagues at Bosch did most of the work on this one. There was a website describing the format, but I can’t find it now. It was an extended version of the rosbag format to deal with some of the shortcomings of that format.
An EBML format based on the Matroska format
I did most of the work on this one, so it’s the one I know the most about. Matroska uses a file format called EBML, which stands for Extensible Binary Meta Language. You can think of it as a binary form of XML. An advantage of EBML is that you can specify a schema that defines the file structure, which is exactly what Matroska does.
While I was able to define the complete format and produce part of an implementation (“tawara” means “sack” in Japanese - I spent way too much time trying to choose a “cool name”), I unfortunately got shifted to a different project before I had time to complete it. It was also up against the Jupiter-sized intertia of the existing rosbag format by that point, and I don’t think it would have ever been adopted even had I completed an implementation. I did, however, get far enough to convince myself that the format solved all the problems of the rosbag format and then some.
The Matroska format looks very complex, and it is, because it has to deal with all of the foibles of various media formats to enable it to be a flexible container format. Fortunately, we didn’t need most of that complexity (no differentiation between audio and video, for example, or support for 3D video), so the adapted format is much simpler while retaining all of the flexibility of a container format. Because ultimately that’s what we need for rosbag2: a fast-to-write, easy-to-read container format.
I remember from when I was doing this work that although the format has more complexity than the rosbag format or the format that Ingo’s team produced, it had advantages and additional features that could have been useful. Some that I can remember are:
- Ability to write data fast and index it later (or not at all if you don’t care about easy seeking)
- Fast seeking, when index information is available
- Chapters, for rapid jumps to interesting parts of a bag file
- Can store the schema of messages, as well as any other attachment you like (thumbnail of the visualised data, for example)
- Serialisation-agnostic, including a different serialisation format for each topic if that’s your thing
- Robust to errors, including the use of CRC-32 checksums, the ability to skip corrupted elements, and re-indexing after recording
- Robust to version changes, as unknown elements can easily be skipped allowing old parsers to play files produced on newer files (to a limit, of course)
- Files can be rewritten in place on disc to a degree, if consideration is given at recording time. This is most commonly used to allow index information to be added later without changing the file size, especially when splitting files
- Segmentation, which means that not only can you split it into multiple files, you can choose which of those files to play back in what order later on - more useful for AV data than robotics data, but could be useful to splice multiple scenes together
- Support for tags, such as when produced, robot serial number, who reviewed it, information on the parser that produced it, or whatever else you might want to tag a bag file with
- Relative time stamps, meaning you can move an entire block in time by shifting the block time stamp - useful for adding a sudden delay in data for testing, for example
Keep in mind that this was done 10 years ago, when we all knew a lot less about what we need from a recording format. There are some things I think I would change or add in the format now. A few I can think of off the top of my head are:
- Explicitly storing the message schema in the topic information rather than using an attachment per track.
- Change some of the element names to be more like what they are rather than just reusing as-is the names from the Matroska specification.
- Add an element to the header containing the PNG/HDR5/MCAP magic bytes to detect transfer errors, because those look useful.
- Add an additional timestamp field for “message received time”.
- Provide space for whole-file information such as earliest and latest time stamps, total message count, etc. Information that can be calculated and written after recording is complete.
- Update the format to comply with the new EBML RFC draft, including providing an XML-format schema, and remove redundant information defined in the RFC, such as the definition of void and CRC-32.
- Change the specification to be Markdown instead of reStructuredText? Depends on table-rendering capability.
- Choose a better name.
I’d love to revise this format and produce an implementation. In case anyone is interested in iterating on this format, I’ve pushed the specification to a new repository. Fixes, additions, improvements, and other useful contributions are welcome! I can move the repository somewhere less “belongs to me” as well, if that’s preferred (perhaps an OSRF repository?).