I’m developing a time series database that stores a history of blob data. When I started the development I focused on computer vision applications and edge computing and had a chance to use them in some real applications. Now, I would like to try to integrate it with ROS because I feel it could be of some use in that area.
Unfortunately, I have very little experience with ROS, and before diving deeper I would like to get feedback from the community.
Could someone take a look at the project? Could it be useful? If so, what is the best way to integrate it into ROS?
P.S. Please don’t consider my post as self-promotion. I really want to be useful, but I’m short of time and I don’t want to waste it on solving non-existent problems.
Interesting project and I agree that ROS could benefit a lot from an easy integration with a timeseries database. To get started, I would get some sample bag files or mcap files, build the necessary ingest into your store, and then run some sample queries over them to show the value of this. Some sample bag files can be found in online robotics courses and the material from autonomous driving courses.
Question: what made you build a new timeseries DB from scratch, i.e., why are existing ones not suitable for blob storage? Have you run any comparisons to existing timeseries (e.g., Prometheus, Victoria Metrics) and column-data bases (ClickHouse, etc.)? I’d be very interested to see what sort of database might be best suited for ROS data, which can vary a lot depending on the (ros-) topic.
Hi @chfritz, thank you for the advice. I’ll learn more about bag and mcap files.
I’m glad that you asked about my motivation because I would never start such a big project if I found a proper solution for my problems. I’m working in a company which integrates AI algorithms for conditional monitoring in industrial environment. Very often we use photos from CV cameras to measure product quality or vibration data to diagnose a mechanical parts of a producing machine.
To train and validate, our models, we need a history of this data. Unfortunately, they are quite big blobs, and you can’t put them into a classical TSDB. They have limitation on size (for InfluxDB it is 64kB) and they can’t handle them in efficient way, they can do with numbers. You can try to use a S3-like storage, but it is not so easy to take data via a time interval. The HTT API works as a file system, you need to browse your data or know a path. We tried to use a TSDB to store paths in S3-like storage, so we can avoid the browsing, which maybe very time-consuming if you have millions of files. Beside of complexity and the fact that it is already two databases, the next problem is data reduction.
We gather data continuously on an edge device, we run out of disk space in few days, and we have to remove old data. However, object storage doesn’t provide this functionality because it is obviously no time series database. You have to implement the data reduction on your own, it isn’t so easy as it may look like.
ReductStore is a hybrid of object storage like S3 and a time series database. And it solved these problems, at least in my company.
I think if a robot has a camera or microphone, and you need this data available for training, moreover you don’t want to bother yourself with disk space and cleaning old data. The database could be useful. If it is relevant to ROS, I’m ready to invest some time for integrating.