Introducing mcap-etl: MCAP to TimescaleDB (+ more dbs soon)

Hey all,

We’re thrilled to introduce mcap-etl, an open-source package that allows you to transform MCAP files into a database.

Installation

Installation is straightforward with pip:

pip install mcap-etl

The Challenge

Working with MCAP (or ROS Bag) files presents two significant challenges:

  1. Storage Consumption: These files can quickly balloon in size, often necessitating on-demand downloads from cloud storage to avoid using excessive local disk space.

  2. Serialized Structure: The serialized format of these files, while useful for controlling file size, requires developers to create custom scripts for each use case or build complex cloud infrastructure with custom data parsing and extraction pipelines.

Our Solution

We designed mcap-etl to alleviate these problems, starting with a transformation pipeline from MCAP to TimescaleDB. This enables you to run any time-series based queries on your ROS data. For every topic, our package creates a table, and for every message, it writes a record.

Previously, if you wanted to plot battery voltage over time for your robot in Grafana, you would need to write a specialized ETL job to populate a database. Now, you can connect to Timescale in Grafana, and run a query like so:

SELECT time_bucket('30 seconds', ts) AS bucket_time, AVG(voltage)
FROM battery_state
WHERE $__timeFilter(ts)
GROUP BY bucket_time
ORDER BY bucket_time;

We’re welcome additional transformations as suggested by the ROS community. :slight_smile:

Additionally, we’re developing a hosted solution offering managed services for data ingestion, database management, and infrastructure for integrations including S3 and Grafana. This service will also provide tools for converting data back from Timescale to .mcap and .bag formats, and a web interface to monitor and share data with your team.

For more detailed information on installation and usage, please visit our GitHub README.

We eagerly await your feedback, suggestions, and contributions!

2 Likes

Hi, interesting work!

I have tried out with a sample dataset: demo.mcap while running into “MemoryError”. Do you have a clue or hint? Thank you!

mcap-etl timescale ~/mx/data/demo.mcap
Converting mcap to rosbag: file=/home/simuser/mx/data/demo.mcap
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "/home/simuser/.local/bin/mcap-etl", line 8, in <module>
    sys.exit(main())
  File "/home/simuser/mx/mcap-etl/mcap_etl/main.py", line 36, in main
    timescale.run(args.file, args.host, args.port, args.user, args.password, args.name)
  File "/home/simuser/mx/mcap-etl/mcap_etl/timescale/executor.py", line 12, in run
    file = mcap_converter.to_rosbag()
  File "/home/simuser/mx/mcap-etl/mcap_etl/convert/mcap.py", line 33, in to_rosbag
    for ts, ros_type, schema, topic, data in self.__message_generator():
  File "/home/simuser/mx/mcap-etl/mcap_etl/convert/mcap.py", line 23, in __message_generator
    for schema, channel, message in reader.iter_messages():
  File "/home/simuser/.local/lib/python3.8/site-packages/mcap/reader.py", line 189, in iter_messages
    summary = self.get_summary()
  File "/home/simuser/.local/lib/python3.8/site-packages/mcap/reader.py", line 248, in get_summary
    footer = next(StreamReader(self._stream, skip_magic=True).records)
  File "/home/simuser/.local/lib/python3.8/site-packages/mcap/stream_reader.py", line 119, in records
    record = self._read_record(opcode, length)
  File "/home/simuser/.local/lib/python3.8/site-packages/mcap/stream_reader.py", line 204, in _read_record
    self._stream.read(length - 9)
  File "/home/simuser/.local/lib/python3.8/site-packages/mcap/data_stream.py", line 26, in read
    data = self._stream.read(length)
MemoryError