The method can extract and visualize the path of a message across a ROS 2 system. It works for distributed systems (any number of hosts!), and also supports user-level links between input and output messages. There’s also a visualization of the state of the executor instances over time.
This is a bit off-topic, but ISP and TIER IV are also working on a similar tool for tracing Autoware.universe. CARET derived ideas from ros2_tracing and RAPLET.
RAPLET: Demystifying publish/subscribe latency for ros applications.
First of all, let me share about this one, just a brief announcement.
CARET
Features and differences to ros2_tracing are listed as below.
Features:
Low overhead with LTTng-based tracepoints for sampling events in ROS/DDS layer
Flexible tracepoints added by function hooking with LD_PRELOAD
Python-based API for flexible data analysis and visualization
Application-layer events tracing by cooperation with TILDE, runtime message tracer
Differences:
CARET-dedicated tracepoints are added by function hooking with LD_PRELOAD
CARET also utilizes existing tracepoints for ros2_tracing
Implementation with C++ template prevented us from applying LD_PRELOAD, so that we added a few tracepoints to rclcpp directly
Our target is ROS2/DDS layer mainly, but OS events like sched:wakeup is out of scope
CARET will trace /tf topic after v0.3.x release, and this function is under test now
We observe and visualize data via Jupyter notebook with using Python-based API served by CARET
To tackle difficulty to calculate latency of a node who has complicated dependency between inputs and outputs on a node, we use wrappers for publishers and subscriptions to annotate each message
CARET will cooperate with another tool, TILDE, a framework which detects deadline overrun. This is under development.
Only single host application is supported, but CARET cannot be applied an application who runs on multiple machine via network
CARET separates path selection and latency calculation.
Select the path to be evaluated from the node graph
Our team has been so eager to trace Autoware.universe with CARET that we were late in introducing this tool to the ROS community.
First of all, since CARET’s goal is very similar to that of ros2_tracing, I’m willing to share my experience and learning to apply CARET to a large software, Autoware.universe, if you are interested in.
@hsgwa I knew about RAPLET (and we included it in both the ros2_tracing paper and the new message flow paper), but CARET looks very nice! You should definitely announce these kinds of tools in their own posts as soon as they’re ready!
Being able to select a specific path in the DAG is a good feature. I’m looking forward to learning more about TILDE, whenever it is released I’m also interested in seeing how you support /tf and other pub/sub extensions (e.g., message_filters, image_transport). As mentioned in our paper, since we want our method to work on existing systems out of the box, detecting links between input and output messages without pub/sub wrappers is quite a challenge.
This is a fair question. While our overall goals are similar in general (i.e., tracking messages and getting various timing-related data), as you pointed out, the exact features are different. Besides supporting distributed systems, our goal with this paper is really to be able to extract and display execution data that we can compare with data from other sources (e.g., Linux kernel and other application-level data) to get the full picture in order to analyze the performance of the whole system. This is why our implementation uses Trace Compass. Therefore we do not plan on porting this method over to tracetools_analysis.
I would definitely be interested in reading a full standalone post about this!
@christophebedard Thanks for the comment!
we’ll post a new article as soon as it is ready.
detecting links between input and output messages without pub/sub wrappers is quite a challenge.
We’re not completely ready to deal with them yet, but develop version can measure /tf with just a hook using LD_PRELOAD.
However, intra-node-sub-pub links are difficult in arbitrary node implementation, and we are thinking of using wrappers; TILDE is pub-sub wrapper .
We’ll also announce TILDE when it is ready