Hi, I was hoping to get a bit of clarity on something that’s been in the back of my mind regarding a migration to ROS2. At one point long ago, I remember reading that there was some consideration for building services over DDS pub/sub, and I thought ‘great, that means we’ll probably get first-class support for service bagging’. Lack of services is a huge pain point with using bags as a system logging mechanism in ROS1.
At some point, the landscape shifted and ROS2 services were implemented via DDS-RPC instead of pub/sub, which I imagine precludes using a side-channel recording mechanism like bagging. Sadly, with ROS2 actions being (rightfully) implemented via services, this means neither services nor actions are baggable in ROS2. That’s really a shame - I’m sure I’m not the only one who occasionally used ROS1 actions over services in some capacity just because they were baggable.
Understanding that ROS2 development is a sea of shifting priorities - is this something that would even be possible to resolve without fundamentally upending the design? Can it be ‘solved’ at the rmw implementation layer, or would it require more fundamental changes?
And other implementations, like OpenSplice, use our own version of this on based on topics.
However, Services in ROS 2 do not have to implemented with Topics, as you pointed out. They are their own concept in the rmw API, this was done to allow for optimizations for Services if desired. Making them on top of Topics always would perhaps not be the most efficient thing to do.
What’s prevented us from recording them is having some rmw API for observing the exchanges between a client and server by a third party (like rosbag). We could add this API, though it may make it hard for future rmw’s which don’t use a one/many to many mechanism to implement services, which presumably would be where the efficiency gains would come from (by using a one to one comm pattern like gRPC/HTTP2 or something).
One thought, that wouldn’t require changes to rmw, is that we could simply republish requests and responses on a well known topic name. For example, if the service were called rosservice:///activate_death_ray, there might be a rostopic:///activate_death_ray/_request_log and rostopic:///activate_death_ray/_response_log, publish to by the client and service respectively. They could be activated selectively (to keep overhead low, activating this by default for all services would be expensive I think), and that would enable recording of the services.
I think recording is the only thing that makes sense though. I don’t see how replaying requests or responses directly makes sense.
That’s an interesting suggestion. I would potentially take that one step further and introduce a new generic topic similar to rosout or clock that has all the service and action request, response, and feedback logged to it. I think that could actually have a variety of useful consequences.
For example the topic may have the information like action/service name, type, timestamp, a string/serialized-version of the request/response/feedback and the caller (assuming that’s available, which I think it is). The downside clearly is in trying to make a general message to go over that topic, the request/response/feedback wont be in their native types but as strings or serialized blobs. Though it could be conceivable that with the type and string/serialized a rosbag-server-recorder tool could convert them into real types for playback/reading.
there might be a rostopic:///activate_death_ray/_request_log and rostopic:///activate_death_ray/_response_log , publish to by the client and service respectively. They could be activated selectively (to keep overhead low, activating this by default for all services would be expensive I think)
That’s a great approach! What would be the expensive part? I imagine if a bag recorder exists that whitelists activate_death_ray for recording, it could just subscribe to those topics. If no bag recorder exists, isn’t it a performance no-op to create a topic with no subscribers?
I think recording is the only thing that makes sense though. I don’t see how replaying requests or responses directly makes sense.
I imagine there’s some esoteric cases where someone may want a bag playback to emit service calls to an external node, but I can’t think of anytime I’ve seen a practical application.
new generic topic similar to rosout or clock
This would probably not be as useful as the namespaced version proposed by @wjwwood. A topic like that would be an all-or-nothing firehose, and potentially prohibitive to subscribe to from a large node… short of a DDS mechanism like keyed topics, which may not even be exposed in ROS2 (yet).
I would personally just find it annoying to have to enumerate all the servers to bag up, *_request, *_response, *_feedback for N servers. I perfer the one-stop-shop firehouse that I can pare down on debug.
There’s always overhead to having more things, even if there’s no actual match. There’s discovery traffic and memory usage at least. And since every node will likely have a few services the impact is large even if the individual fixed cost is small.
How would you log them? As a string? You cannot have different types on the same topic (or you shouldn’t). If you’re logging them as a string, you could just use rosout.
A parameter to rosbag could be used to toggle recording all topics matching *_response_log and *_request_log. It wouldn’t require the user to manually enumerate all services to record.
I guess action feedback can already be recorded since it is a topic.
It’s a bit of a wild idea, and make it feasible, we would need to support the Windows and MacOS tracing frameworks, in addition to Linux, and we would need to look at making this location transparent, but it is an existing, low overhead, tunable, configurable, data capture mechanism which can support any source.
That’s an intriguing idea, but I’m curious about how low the overhead would really be? One of the advantages of capturing via subscription is that you don’t directly impact the execution time of the sender (assuming you allow for enough computing resources, etc., of course). How much would capturing large data via tracing impact the execution time? Or is there zero impact?
That honestly sounds like a great idea, although probably a completely separate discussion :). Between that or something like bagging-via-pcap, it would be great to have an option to record data without having to depend on the underlying connections to function reliably.
It looks like the rosbag2 storage backend is pluggable, but unless I’m missing something, transport is not.
In general, I would expect it to have less impact than the current rosbag approach. Before going down that route, we should probably test this hypothesis, however.
The way this works is as follows: The tracing frameworks I’ve looked at make a copy of the data and store it in a lock-free ring-buffer inside the process for later retrieval by the capture process. That happens at the point where the tracepoint inserted, and it will block until the data has been copied.
The later retrieval and disk storage by the capture process happens asynchronously, however, so – apart from consuming CPU and disk bandwidth – storage does not impact the traced process.
Of course, the exact impact also depends on where you’re putting the tracepoint. The easiest way would be trace messages when they are serialized. That way, you don’t have to deal with the exact type, and just store a byte array. This would be the same pathway that also sends data to rosbag via subscription.
Now, when you have an application that sends around images intra-process, they would not normally be serialized. Adding any kind of recording will change that, and thus impact the system. With tracing, you could, in principle, generate a message-type specific tracepoint, however, and avoid (DDS) serialization. That’s advanced stuff, but it could be done.
What I’m not sure about is how this all compares, performance-wise, to the new shared-memory transports that are currently being introduced.