Proposal - New Computer Vision Message Standards

It’s not clear to me if the proposal supports per-pixel segmentations. There is the source field of Classification2D and Classification3D which might be used for segmentations, but from the documentation I’m not entirely sure if it is also meant for segmentations or not. The name source is a bit confusing to me.

There might also be other types of detection besides bounding boxes and segmentation that I’m not currently thinking of, though the two seem like a pretty solid base for now.

ClassifierInfo.msg:

<snip>

# ROS parameter name where the metadata database is stored in XML format.
# The exact information stored in the database is left up to the user.
string database_param

Why XML? Why not YAML or JSON, or completely implementation defined? Or what about the name of a tree of parameters on the ROS parameter server?