Evaluation of robotics data recording file formats

Regarding the encryption, you are correct, it is possible to encrypt the whole file after the fact instead of having encryption built-in. There is some down-sides of not having the format or the tooling natively handle encryption though. For example, when looking at an encrypted bag, you would have to first decrypt the bag as an extra step before being able to do anything with it. This also means that the decrypted data will have to be stored on disk increasing the risk of data leakage.

In the rosbag case, however, we selected to implement the encryption in the same way the compression is implemented. That way, all rosbag command still worked and encryption/decryption is completly transparently handled. From the user perspective, it is the same workflow for encrypted bags as it is for non-encrypted bags. There is no extra steps involved to decrypt the bag before hand. Dealing with the decryption on the fly is obviously to keep the data always encrypted at rest (on the disk). A similar concept goes for compression. It would be a pain to have to first decompress the bags and have to store them on disk before being able to make use of them.

To answer your comment about usecase more directly, encrypting the data is the best way to protect potentially sensitive information (such as PII from camera topics and other) when the robot is deployed in a customer facility (I am not talking about a lab environment but more of an industrial environment).

Regarding my comment about compression, I did not word my sentence properly. What I meant was that in rosbag, the encryption and compression are handled the same way. I did not mean we implemented a different compression method (just the encryption part). Sorry about the confusion there.

@Guillaume_Autran I think we are covering encryption case now in new format. See my editions in format draft.
I added new fields in ā€œTopic Infoā€

| 4+N | compression | String | Per message compression | ā€œnoneā€, ā€œjpegā€|

and special field in to the message record.

| 1 | compressed | uint8 |Flag indicating if message compressed according to the compression type specified in the Topic Info |

You can specify encryption type in ā€œcompressionā€ filed in Topic Info and check flag ā€œcompressedā€ to determine if message encrypted or not.

Ok, thanks @MichaelOrlov . Iā€™ll take my comments into the documents now.

@gbiggs Thanks for all this information. I spent some time evaluating MKV and Tiwara based on this comment, and wanted to share some questions/understandings I reached. I want to say off the bat that I think the idea of using MKV for this is very clever and agree it has a lot of potential.

  1. ROS bags allow a new connection to appear at any point during message recording - connections can be interspersed with the messages. In an MKV/Tiwara file, all ā€œtracksā€ are written to a contiguous section before the data/clusters appear, so it is best if they are known ahead of time. I think there are a few possible answers to this:
  • MKV/Tiwara provide facilities for reserving space in elements that can be updated later. Itā€™s possible this could be used to add new tracks as they appear. However you must be certain to reserve enough space for all tracks that may appear, which is tricky to get right if schemas can be large.
  • Another way around this is to record new connections to new, single-track MKV files. This has the drawbacks of resulting in more files to manage, as well as introducing ambiguity around what files the chapter/other metadata should land in.
  • Another way is to record the different sections of the MKV file to different files, and concatenate them in post processing.

Any thoughts on how this would be handled?

  1. In ROS bags, a ā€œchunkā€ covers a set of heterogeneously-typed messages. In an MKV/Tiwara file, there are ā€œclustersā€ of ā€œblocksā€ of ā€œframesā€. Within the clusters, each block would consist of frames on only one ā€œtrackā€, and each block would be individually-compressed (assuming we make some kind of compression a supported codec). To get the equivalent of a chronological scan through frames in the file (equivalent to a sequential read over ROS messages), we would need to do an N-way merge of individually-decompressed streams, where N is the number of topics in the file (potentially thousands). There is not, as far as I can tell, support for compression/encoding at the cluster level. If thatā€™s correct, one thing I wonder about is whether this might have significantly different performance characteristics from typical usage of MKV where the number of streams is more limited (if thatā€™s even a correct assumption!).
  • A related thought that occurred to me, was that the blocks in clusters seem to be aligned on time (since all blocks in a cluster are stamped relative to the same cluster time), meaning a cluster may contain a few blocks with large numbers of frames and a lot with very few. The MKV specification recommends clusters of 5MB or 5s, while some robots may record at hundreds or thousands of megabytes per second, potentially with individual messages exceeding 5MB. Will off the shelf MKV tooling handle this? I have no experience working with the format personally.
  1. The ROS bag format supports a two-layer index, where the section at the end of the file is an index of chunks, and there is an index of messages after each chunk. This avoids the need to consider or download message indexes for irrelevant chunks when consulting the index for messages in a particular time range.

In MKV/Tiwara, the index is the ā€œCuesā€ element that comes after the clusters. It is not clear to me if this ends up being a message-granular index in practice. In the worst case it probably has an entry for every block on every track, and depending on the dimensions of your recording itā€™s possible youā€™re often recording just a message or two per block. The full index of cues would need to be downloaded to seek to a particular time offset in the file. Does this end up being a very large index/large download?

Appreciate any input/corrections. The idea of MKV for recording seems interesting to me, but Iā€™m still unclear to what extent this is ā€œproven groundā€ for the format, and what assumptions might break if recording at very large numbers of tracks and very high/track-variable volumes ā€“ as well as to what extent this can be retrofitted onto ROS given hassles like unknown messages appearing after recording start.

Off the top of my head, the way I would handle it given the current specification would be to record a tracks element at the start of the file with the tracks known at recording start. Then, if new tracks appear, Iā€™d either blank out that element and record a new tracks element partway through the file, or keep tracks in memory and at recording end if the tracks have changed since recording start, blank out the original tracks element and record a new one at the end of the file. I think ideally Iā€™d prefer to have the new tracks element appear partway through the file and be merged with the one from the file start, because that would maintain the ability to stream the bag.

I donā€™t know how the tooling would handle it, but I agree that the MKV recommended chunk size would not be suitable for robotics. The recommended chunk size in MKV is oriented towards how many frames you end up storing in a chunk, rather than the size of the chunk in bytes. The idea is for the chunks to be large enough to provide efficient indexing but small enough to provide fine-grained indexing.

The size of the cues element depends on how many places in the bag you want to index. You can index every chunk, or index every minute, or just index specific times in the file you think are important. I donā€™t think it would be sensible to have a cuepoint for every block in the bag. However I can agree with having an ability to have multiple cue elements; Iā€™m not sure if the current specification allows that. It might, because of the segment concept that allows files to be split up.

The MKV format, and by extension the Tawara format, supports streaming. As long as youā€™re not trying to seek in the bag, the cues element is irrelevant. The bag is playable without it, so you can start playing a bag as soon as you have the first block available (assuming the tracks are known).

Thanks for your thoughts!

Has anyone looked at the pcapng file format for inspiration? I donā€™t think itā€™s directly usable, but it is designed to capture data extremely fast.

1 Like

Another HDF5 based data format from the autonomous vehicle area, called omega format, developed by the rwth aachen university. No extra serialization for the stored messages in opposite to the eCAL HDF5 format. Just to mentioned it for inspiration :slight_smile:

1 Like

As a follow-up to this discussion, the MCAP file format is in beta status and language implementations are ready for testing. More information in the thread here: MCAP: A recording file format for pub/sub systems, designed with robotics in mind

2 Likes