Hello ROS Community,
We are very excited to announce our recent work on multi-domain semantic segmentation, entitled MSeg, which we presented at CVPR last month. You can try out one of our models in Google Colab on your own images here. We trust these models will be useful to roboticists – please watch our Youtube video to see the results of our single MSeg model in dozens of different environments – indoors, driving, in crowds, mounted on a drone, and more.
We’ve trained semantic segmentation models on MSeg, a single, composite dataset that unifies 7 of the largest semantic segmentation datasets (COCO+COCO Stuff, ADE20K, Mapillary Vistas, Indian Driving Dataset, Berkeley Driving Dataset, Cityscapes, and SUN RGB-D). Since they all utilize different taxonomies, we reconcile the taxonomies and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images, requiring more than 1.34 years of collective annotator effort.
While models in the literature are often trained and tested on data from the same domain/distribution, we present a different approach to evaluating a model’s robustness – zero-shot cross-dataset generalization – that we believe is more aligned with real-world performance. We test our model on a suite of held-out datasets – PASCAL VOC, PASCAL Context, Camvid, WildDash, KITTI, ScanNet – and find our model can function effectively across domains and even generalizes to datasets that were not seen during training. Training on MSeg yields substantially more robust models than training on any individual dataset or naive mixing of datasets. Our approach also outperforms recent state-of-the-art multi-task learning and domain generalization algorithms (see our paper for more details).
For those of you with interest in autonomous driving, a model trained on MSeg ranks first on the WildDash leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
A few links:
- Read our paper: PDF link
- Download the dataset here.
- Download pre-trained semantic models, demo scripts, and evaluation code here.
- Watch our teaser video.
John Lambert*, Zhuang Liu*, Ozan Sener, James Hays, and Vladlen Koltun. MSeg: A Composite Dataset for Multi-domain Semantic Segmentation. Computer Vision and Pattern Recognition (CVPR), 2020.
*Indicates equal contribution.