Proposal - New Computer Vision Message Standards

Is it likely that many pixels in the image will have identical distributions? It seems that “apple” pixels near the edge of the apple would have a different probability distribution than those near the center. All the ML-based segmentation systems I’ve seen either predict a single output class for a pixel (such as a binary classifier), or they produce probability vector

It seems like a bit of a halfway solution to define a small set of distributions that the image uses as an index, then transmit that set with every result. I feel that these two options would work based on use case:

  1. The image is segmented in some small finite set of output classes, which do not have probability distributions that vary in space/time: use an Image message where the lookup value of the pixel is the output class. If desired, static probability distributions for each class can be communicated in a one-time fashion, such as via a single CategoryDistribution[] message, or via the parameter server

  2. The output segmentation includes varying probability distributions that are calculated per-pixel or per-small region: use a CategoryDistribution of length the size of the image, where each pixel has its own unique distribution that may change every frame.

Let me know if I missed something! If you have some code available for a use case, that’s really helpful. I’m currently in the process of writing example classifiers to use the Classification/Detection messages and finding it a useful exercise.