About QoS of Images

Dear ROS Community,

These days I have been working with various packages related to images (ros_astra_camera, image_common, image_pipeline) and webots. I have found that the QoS of the images is a problem. As far as I know, an image should be published by default as rclcpp::SensorDataQoS (let’s talk in C ++, although I include any RCL), and the associated info as rclcpp:QoS(N).transient_local(). In general, any sensory data should be like this.

I would like to know if this is correct, and if we should start to make it effective in the packages we are developing. I was working on several PRs to these packages to make this effective, but I think maybe this is the right place to agree on it.

Finally, I would like to know why (I am sure there should be a technical reason) a subscriber has his own QoS, instead of automatically using the quality of service of the publisher (with warnings if there is more than one publisher with a different QoS). I think this can make life more pleasant for us. :blush:

Happy coding
Francisco

3 Likes

@fmrico I know what you are speaking about. I had the same doubts when I started working on the ZED ROS2 wrapper.

QoS is indeed a big issue when speaking about images, and about sensors in general, and I think that a REP describing the right way to set the QoS settings for sensors would be really important to set a standard to be followed by all the nodes.
I also agree with the fact that a subscriber should be able to automatically adapt to the QoS of the publishers in order to be able to automatically receive all the messages. It is easy to set the wrong QoS for the subscriber and this will be easier the more that a lot of publishers will be created without standard design documentation to follow.

PS speaking about images, be aware that image_transport is not yet compatible with LifeCycle nodes

I just ran into this today, I’m glad to know other people are thinking about it.
I think the subscribers set their own QoS so that they don’t have to wait for a message to be published before they can register their subscription on the DDS.
I’d really like to see clear warnings in rqt and rviz, at the moment they just silently drop image frames.
Docs regarding the design decisions behind QoS, and its usage would be fantastic.

Does anyone know of an example for QoS overrides at launch time?

edit: extra context for newcomers
https://index.ros.org/doc/ros2/Concepts/About-Quality-of-Service-Settings/

@BrettRD if you look at the ZED ROS2 wrapper I added the possibility of setting the QoS of the publishers for all the published topics. Each parameter is available in the configuration YAML file and it overrides the default:

1 Like

I wonder if remapping pub/sub QoS’s would be possible like topic names. That could help alot here.

There’s a new ROS 2 feature, available in Galactic, that let’s you configure QoS with ROS parameters (original proposal). You can check out the upcoming release notes or design doc for more information. You can try out the new feature if you install ROS 2 Rolling or from source.

1 Like

Oh this is great! Would this be the way to make it so that all QoS can be overridden? Would the default for QosOverridingOptions be true or false?

I feel like a better default would be towards reconfigurability but I could see the security folks potentially having an issue with that?

The feature is “opt-in”; a node author must specify which QoS settings are allowed to be overridden. For convenience, there is a default set of QoS settings that can be enabled for override (e.g. here’s the C++ API). The motivation for having it opt-in is to force node authors to stop and think about possible consequences to their implementation if they let users override QoS settings.

1 Like

Ok after looking over the doc more closely, I can see the point of “opt-in” better.

Though to play out the 3 situations I see

  • Opt In: Defaulted QosOverridingOptions is false when unspecified will annoy maintainers for people that want to change these constantly. While it could trigger a “stop and think” while creating interfaces, it doesn’t require it.
  • Opt Out: Defaulted QosOverridingOptions is true when unspecified will annoy users if they change things that don’t meet the basic requirements of the node without thinking / understanding it more thoroughly. Not an obvious “stop and think” moment for users when things aren’t working right. This is probably the worst option.
  • Required: No default QosOverridingOptions, required for each sub/pub creation will annoy everyone equally and force that “stop and think” for each. Then is used correctly, users don’t get annoyed and maintainers don’t get annoyed.

So I think my suggestion is to have it be a required field like QoS is in the creation of each interface without a default, create different create_* factories, or default QoS profiles for reconfigurable or non-reconfigurable sub/pubs to mask that additional option from users (new default QoS profiles for each sounds like the most reasonable to me).

1 Like

Is there a particular reason the defaults for publishers is the same as subscribers?
The QoS of the subscription functions as a minimum requirement, and the system defaults are needlessly strict.
Normally when people are just poking around, trying things out, and developing a node, the exact QoS isn’t critical as long as pubs and subs are compatible.
If the default subscription requirements were relaxed across the board, and the publishers stayed strict, all nodes would be compatible until the developer started fiddling with QoS.

At the moment, QoS incompatibilities and and opt-in/opt-out, are going to create a substantial configuration burden.
Safe-defaults would help greatly with the learning curve

Although there is no technical reason why subscribers can’t automatically detect and adapt to publishers, there are two reasons they do not:

  1. If there is more than one publisher on the topic with different QoS settings, the client library would have to either choose one and risk being incompatible with the other, or somehow mash the two sets of settings together to produce a single superset.
  2. Subscribers are their own entity and know best what their own requirements and resources are. They need to set their own QoS properties appropriately.

Can you be more specific about where they are too strict for you?

1 Like

Can you be more specific about where they are too strict for you?

Image topics in particular don’t need reliable transport on telepresence platforms, especially when the biggest cause of frame dropping is network congestion. Dropped frames are ok if they save your network stability and latency.

gazebo_plugins have made a change to their sensor plugins so that publishers use the recommended rclcpp::SensorDataQoS (best effort reliability) instead of the implicit rclcpp::SystemDefaultsQoS. This breaks rqt image_viewer because its image subscription is following the default of demanding reliable transport.

For the sake of new users, subscribers should make do with what they can get until the user decides they want to learn about QoS.
This is possible by differentiating rclcpp::SystemDefaultsQoS into two separate but compatible publisher (strict) and subscriber (permissive) profiles

edit: grammar

Unfortunately there is no “permissive” mode for QoS settings like reliability and durability in DDS. And we have no way of implementing it easily. You can read about this issue w.r.t. to “latching” (ROS 1 term) or durability in ROS 2/DDS terminology:

Basically DDS requires that both end points (publishers and subscribers) specify their QoS and there are parts of the 2x2 matrix which simply do not work, e.g. when a subscription’s QoS is reliable but the publisher is best effort. In ROS 1 this would have “just worked” and we want that feature, i.e. a reliability QoS setting for subscriptions that is like “reliable if it is available, but best effort if it is not”, but DDS does not provide a setting like this. That issue above I describe how we might work around this for durability, but for reliability we discussed having two topics, one for reliable and one for best effort and having subscriptions subscribe to both, but aside from being horrendously complicated, it suffers from a double deliver problem and it also complicates further the queuing of messages in the history cache (now there’s two history caches).

So while on the face of it, it’s understandable to want what you’re asking for, it is unfortunately not something we can easily implement.

In the meantime, the feature to allow reconfigure of the QoS without recompiling at least lets you make it work.

1 Like

I think I’m missing something.
By permissive, I was referring to installation-time system-default QoS profiles where users who don’t override are more likely to land on parts of the compatibility matrix where it does just work.
ie:

  • default publication is to reliable, transient local
  • default subscription is to best-effort, volatile

This would be unrelated to the override system (which looks great), and would simply be a node-writer’s convention for selecting safe defaults where it’s not critical

I’m not fully up to date on QoS interactions, does publishing reliable into a best-effort subscription cause occasional message duplication?

In that configuration, data would be best effort and volatile by default. We want it to be reliable and volatile by default (matching ROS 1’s style).

This is because when the publisher is reliable and the subscription is best effort the data goes best effort. Which again, is not what we want by default.

If you use the sensor data QoS on both sides then it will just work and also be best effort, but that’s on purpose.

That clarifies a lot!

I didn’t realise DDS convention was to use the lower resource transport, so many design decisions make sense now.

I also didn’t realise that the DDS XML was expressive enough to set publisher and subscriber defaults separately.

So if I wanted to break ROS1 style, leave configuration hell, and enter network reliability hell, I should change my DDS defaults at the DDS level and hope that the problem nodes are using rclcpp::SystemDefaultsQoS

ros2_ws/DEFAULT_FASTRTPS_PROFILES.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!-- This deviates from ROS1 behaviour in favour of not dealing with QoS issues -->
<dds>
    <profiles>
        <publisher profile_name="test_publisher_profile" is_default_profile="true">
            <qos>
                <reliability>
                    <kind>RELIABLE</kind>
                </reliability>
            </qos>
        </publisher>
        <subscriber profile_name="test_subscription_profile" is_default_profile="true">
            <qos>
                <reliability>
                    <kind>BEST_EFFORT</kind>
                </reliability>
            </qos>
        </subscriber>
    </profiles>
</dds>

This works, for nodes that use rclcpp::SystemDefaultsQoS, but so few of them do that it’s hard to tell if it this XML does anything (it does, on FastRTPS). Most nodes use rclcpp::QoS(N) This XML does not affect rqt, should I send a pull request?

Probably about time this question got an answer:

In retrospect, I don’t think gazebo_ros should be defaulting to the sensor data QoS. As we’ve seen, this just leads to confusion as to why subscriptions are not receiving data from Gazebo plugins, since subscriptions are defaulting to reliable. I guess it seemed intuitive when porting the “sensor” plugins to ROS 2 to use the sensor data QoS profile.

1 Like

Just to say that I think sensor data publishers should not use SensorDataQoS, unless they are somehow resource limited. Because if they do, there is no way for clients to get reliable. In contrast, if the client asks for best effort, it can get it in any case.

This is a problem in many current device drivers.

7 Likes

You’re absolutely right, and I think we reached a suitable compromise.
Gazebo plugins now use best-effort subscription, and reliable publication, using the .reliable() modifier over SensorDataQoS.

For an example of this in action, see this pull request:

2 Likes