Proposal - New Computer Vision Message Standards

Kukanani · May 10, 2017, 3:36pm

Thanks for the awesome feedback, everyone! I’ll try to address everything that was brought up.

First, let me start off by noting that although I only created Classification and Detection messages, I think it makes sense to keep this as a general vision_msgs package, and additional computer vision-related messages can be added as time goes on. I think it’s more useful than making a too-specific classification_msgs or similar.

@reinzor, thanks for linking to your message definitions! I think that annotations are already covered under the existing implementation. You could provide the bounding box coordinates in a Detection message, and the most likely label as the only result in the class probabilities. If we want to add other information, such as color of the outline, etc. then maybe this would be a better fit for visualization_msgs or another package.

On another note, is human pose estimation standardized enough to make a custom message type for it? Or is it best described by a TF tree, arbitrary set of geometry_msgs/Pose, or some other existing ROS construct? I’m thinking of the fact that different human detectors provide different levels of fidelity, so it might be difficult to standardize.

@ruffsl, My idea with having two poses is that the bounding box could actually have a different pose from the expressed object pose. For example, the bounding box center for a coffee mug might have some z-height and be off-center wrt the body of the mug, but the expressed object pose might be centered on the cylindrical portion of the mug and be at the bottom. However, maybe it makes sense to forego the bounding box information, as this could be stored in the object metadata, along with a mesh, etc.

On the topic of nesting, I’m open to the idea of flattening the hierarchy and having Classification/Detection 2D/3D all include a new CategoryDistribution message. I’m not sure how much message nesting is considered standard practice, so I’ll look at some other packages to get an idea.

@Jeremie, I like the idea to add a standardized VisualFeature.msg or other similar message, as long as there is some common baseline that can cover a lot of feature types. From my own understanding of visual features, there’s usually a lot of variation in how the feature is actually defined and represented, so I’m not able to find a “lowest common denominator” from my own experience. If you feel there’s something there that could be broadly useful, please feel free to post it here or make a pull request. I agree with @Loy as well, although many classifiers use features internally, this should be hidden in the implementation except in special cases like the SLAM case described.

Topic		Replies	Views
Vision AI Standard Interfaces General	0	1312	December 8, 2020
Object_recognition_msgs porting to ROS2 Next Generation ROS ros2	2	1168	October 24, 2018
New perception architecture: message types Autoware	29	3865	September 2, 2019
Aerial Message interface standards update? Aerial Vehicles wg-aerial-robotics	15	4702	November 20, 2023
Is it worthwhile including standard messages for registration pulses, features and descriptors? Next Generation ROS ros2	0	464	October 3, 2018

Proposal - New Computer Vision Message Standards

Related Topics