This document is motivated by developing tooling that interoperates with ROS1, ROS2, and other robotic frameworks and speaking with customers about the recording file formats that exist today. I think there is space for a next-generation recording format similar to the ROS1 .bag (v2.0) format, with the flexibility to support ROS2’s pluggable middleware and other robotics frameworks. Before jumping into a discussion on how to lay out bytes on disk though, this evaluation attempts to distill requirements and review what exists today.
If you do write data to disk PLEASE use a format for which readers are already common. It would be good to be able to directly read using R or Python without fitst needing to import anything connected with ROS.
The goal should be to leverage a large ecosystem of data analysis tools that are already in common use in other industries (such as data science)
I/O performance should be a concern too.
Thank you for the feedback. Integrating with the existing ecosystem of data science and analysis tooling is a requirement that should be spelled out. I can add a section to the document covering this.
Unfortunately, there are no Python libraries you can import today that will parse CDR payloads given a ROS2 msgdef or DDS IDL. The next best thing we can do is create a standalone library that is published to package repositories and installable in as many environments as possible, and allow this library to understand the contents of a recording without referencing a ROS installation or any external data dependencies. This is addressed in the requirements section.
We (at Continental Automotive R&D) made good experiences by combining HDF5 (as a container format) with Google Protobuf as a message serialization format. The HDF5 container will provide the generic structure of the measurement with a global measurement header (containing topic names, message descriptors, additional tags) and HDF5 takes care for features like file splitting, compression, interfaces to different languages and so on.
Every recorded sample will then just contain timestamps (sender, receiver for latency analysis), a monotonic counter (to detect drops) and the serialized, dynamic sized payload byte array.
Protobuf is for sure only ONE option and can be replaced by another serialization format if needed. In this case at least the outer HDF5 API is still working, only the second level processing doing message decoding needs to be adapted.
Why is it desirable to have the recording format be serialization agnostic? I can see the benefit, but it seems that an additional storage plugin to rosbag2 may have a dedicated serialization format, and that seems fine. I really like the idea of adding protobuf as a storage plugin for rosbag2…
@aposhian the distinction is between the container format and the message serialization/encoding format. An analogy to video files would be mp4 (container) vs h264 (encoding). The container format needs to be able to support multiple message serialization formats, because:
- Robots already use different message serialization formats (e.g. ROS 1 msg, Protobuf, and ROS 2 supports pluggable RMW with different serializations)
- For performance and stability reasons, robots should not be required to re-serialize their messages into a different format during recording
Thank you, that is helpful.
@john-at-foxglove.dev you write that
There are many different concepts of
time in a recording: the time a message was originally published, the time when it was sent by
the publisher (this can be very different for messages that are held in a queue and replayed),
when the data recording process received it, or when the recorder wrote it to disk. Using a
consistent definition of time is the most critical requirement while storing multiple timestamps
can enable additional use cases.
To me it seems that the time that the publisher sent the message is not very relevant to creating a deterministic playback of the system unless you store something about the network topology. The current implementation of rosbag or rosbag2 is a single observer running at a particular place in the network, so it makes sense to me that all of its timestamps should be relative to the receipt of those messages, rather than when the publisher published them. However, as you mentioned, queuing or failure on the recorder’s part to receive messages fast enough could distort things.
Regarding the timestamps as mentioned before, it’s very helpful to record both (publication time and the time when the message finally received the recorder).
You can decide later which one to use or how to correct latencies but at least you have the chance to detect latencies in your distributed system.
An additional option is to not run one single recording instance anywhere in your network but to run one recording node on every host that is part of your infrastructure. All these instances are orchestrated (started, stopped, configured) by a central recording master application. This is how the decentralized eCAL recording is designed.
The advantage is that your recording nodes ‘nearly see the same timing’ as the user nodes running on the same machine and for sure the recording is not adding extra load on the network to collect data from other hosts. For later deterministic replay you can reconstruct the messages for every node more closely to the original timing seen at recording time on the hosting machine.
This may be off-topic, but what about also thinking where non-file-based recording mechanisms fit into all of this (i.e. feeding into a time-series database). There is merit to having files that are easy to pass around and load into different programs, but all file-based solutions will eventually hit performance bottlenecks. The alternative is using a time-series database, which could carry higher overhead for where the recorder is running, but also potentially provide higher write throughput.
I think there are two separate approaches to this:
How can I get real time data from my robot into a time series database?
Usually you need to first solve the problem of how your robot is connected to a server to ingest this real time data. There are several great fleet management tools available that can do this (Formant, Freedom Robotics, InOrbit, etc). It would also be nice to see some lightweight open source tooling to support more simple use cases (Transitive Robotics is working on something in this space).
How can I record data on robot and then later make it accessible in a time series database?
Typically on the robot you want to get messages written to disk with as little CPU or I/O overhead as possible, which is where a file-based recording format excels. However, I believe a good robotics data file format should come with an ecosystem of tools and libraries to make it easy to convert the data into different formats, which could make it easy to ingest and analyze in a time series database after the fact.