Extending yolact_ros with depth images for real-time 3D instance segmentation

Hello everyone,

I’m writing this topic to introduce my latest work depth_yolact_ros

A ROS wrapper for Yolact that extends the already existing wrapper by utilizing a depth image to generate 3D bounding boxes and pointclouds of the detected objects.

Last week, I was working on a task to detect and localize people in 3D. I had a requirement that it should run in real-time (something like 10fps …ish). Knowing about YOLACT, I thought this would be the best solution for my case and found an already well-developed ROS wrapper for it.

depth_yolact_ros takes the detection boxes and the associated masks, crops the depth image, takes the masked pixels, converts them to a pointcloud using the camera_info, then filters the points for any mislabeled pixel in the mask using k-means clustering first, then a Gaussian model to reject outliers on the depth axis. Each detected instance runs on a thread and all results are published on a MarkerArray topic and a pointcloud topic.

There are a lot of modifications that can be done to make the package much faster. I have included a what's next? section in the Github repo. If anyone is willing to contribute, please feel free to either start or contact if you have any questions!

Here are some demo videos:

Note:

  • I would like to thank Kingdom Technologies for this task.
  • If you have seen my last topic here or on LinkedIn about swerve_steering_controller, I haven’t ignored it. I have managed to improve the quality of the odometry a little bit, and will be finishing the rostests and gtests and making a pull request soon, hopefully!

Thank you!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.