6D Pose Estimators for unseen objects: seeking suggestion on a useable pipeline for moving robots

Dear ROS Community,

I hope this message finds you well. I am VSLAM researcher currently working on a metric-semantic SLAM framework.

I have been following the research on 6D Pose estimators for some time now but have found that majority of the frameworks does not allow or gives instructions on how to add new objects from visual data recorded by a moving robot. Please note that there is no access to 2D/3D point cloud lidar

Can you kindly a pipeline/framework that allows me to train the suggested network to work on new objects based on RGB-D images?

Thank you for your time.

With best,
Azmyin

Can you clarify what you are asking for here? I don’t understand what you mean by:

Please note that there is no access to 2D/3D point cloud lidar

If you are referencing existing work you need to provide links / references to that work.

My experience with models like this is that you need to collect and label your own dataset, and then train on it. That data collection pipeline varies widely based on the application domain. Lots of researchers use existing datasets and conveniently skip over the messy data collection problem.

Hi,

Sorry for the late reply.

To clarify my statement regarding 2D/3D lidar, my initial approach to this problem was using a bounding box regressors like this paper by Mousavian et al.. The dataset these family of papers utilized was the KITTI 3D Object Detection dataset. However, at that time, the annotation tools required to create the GT bounding box pose required the use 3D point cloud which to my knowledge requires access to a 3D Lidar. The UGVs that we have access to does not come with such a sensor.

I then attempted to find a neural network pipeline that utilized dense pixel-level correspondence for estimating 6D object poses with/without CAD model. A promising candidate was Park et. al’s LatentFusion. But beyond LINEMOD and MOPED dataset (introduced in the same paper). But i could not find a reliable tool or instructions that explains how to add new objects that would be compliant with these formats.

I have recently found LabelImg3d that may solve the problem of annotating new dataset compliant with KITTI 3D pose estimation dataset. I will post in this thread my experience with this tool.

This topic is still very much in a research stage. You can find a good overview of the most recent and best performing methods in the BOP challenge: https://bop.felk.cvut.cz/home/

2 Likes

Hi @Tuebel sorry for the very late reply. Yes I looked into the BOP website but as @Katherine_Scott was saying, most of those SOTA methods rely on already pre-built datasets. In my case, I need to train the 6D pose estimator for my own environment.

I will write back in this post if the two tools I found out do indeed help me train a 6D pose estimator on a custom dataset.