Proposal - New Computer Vision Message Standards

I’ve been requested to provide an overall feedback to the proposal. Please find below my review.

In general I feel the effort very valuable and useful. However, from my side, I’d like to add some comments to the proposal:

BoundingBoxXD.msg:

  • Do we need to agree where is the origin of the BB (top-left, center, …)? If so, say it explicitly in the message comments.

  • It’s not rigurous to provide a pose (point+orientation) to something called “origin” which implicitly is just a point. May be calling it “pose” or “origin_pose” is more adequate.

  • size in 2D is implemented with two ints, while in 3D is as a vector3 double. I’m trying to imagine if there is some situation where a float-2DBB is required (subpixel approaches) . Just warning on that.

DetectionXD.msg

  • Sometimes detectors provide also uncertainty on the pose-space of the detection. Providing just a BB for the spatial-related data of a detection does not allow to give this valuable data, specially in fusion (i.e tracking) algorithms requiring to work with pose-space uncertainty.

Lack of Services

  • Think about if it could be useful to add some services in the package, mainly based on the proposed messages. Thus, allowing detectors to work in a client-server mode, with customizable requests.

Best Regards, and thanks again for the effort!