we would like to announce the Ros2TraceAnalyzer tool. Its aim is to analyze traces obtained with ros2_tracing in a similar way as the tracetools_analysis package, but faster and with additional analyzes.
The tool is written in Rust and according to our benchmarks, it processes the traces 30× faster than tracetools_analysis. On our trace of a medium-sized project with 1.8 milion events and 78 MB, Ros2TraceAnalyzer needed 4.2 s to process and analyze the trace, while tracetools_analysis needed 125 s for the same trace. Ros2TraceAnalyzer is also able to analyze traces, where tracetools_analysis doesn’t work well, because it merges callbacks from different processes with the same address to the same graph.
Ros2traceAnalyzer also allows presenting the results in the form of interactive dependency graph with timing quantiles as shown below.
Ros2TraceAnalyzer is able to analyze C++ applications for ROS 2 Jazzy. For Python applications, only message latencies can be analyzed, because rclpy doesn’t include tracepoints for callbacks. We have also developed tracing support for Rust applications written with the R2R library. This is not yet included in the upstream repository, but it is available in the lttng-usr branch of our fork.
We consider the release to be of alpha quality and we’re interested in the feedback from the community.
From what I understand about the trace compass plugin, it does not allow data aggregation. In trace compass, the edges represent individual messages (one edge = one message), but you cannot view a summary over the same repeating edge. We merge the edge to calculate a quantile/percentile value (one edge = multiple messages of the same type), as shown in the picture above.
What would be the benefit of a summary ? In all my use cases of the tracing, I am interested on the exact circumstances of a certain sequence of events, that lead to my timing ‘outliers’. If you summarize, you can only say, there was a outlier, but you can’t reconstruct the events causing it…
Or is the use case of this tool more in the area of finding out, that you have outliers in you timing in the first place ?
This tool is meant to be used to quickly verify that the system is behaving as expected, e.g., the communication actually happens, and the callbacks execute in the expected time.
Yes, this tool can be useful for finding IF there is an outlier in a large trace. You can then use trace compass to look it up and analyze the sequence.
I’ve looked at the source code and I don’t understand how your tool connects to ros2 at all. It doesn’t seem to be based on Ros2 Rust or r2r. Is it just a frontent for Ros2trace?
Yes, it could be considered a frontend for ros2trace. The tool processes trace events generated by tracepoints in rclcpp, rcl and rmw. This r2r fork (for now) extends r2r with rclcpp-compatible tracepoints so that you can trace and analyze r2r applications the same way as C++ ROS applications.
If I understand the tool correctly, I think it would be more appropriate to describe it as a consumer (i.e., “analyzer”) of the trace data obtained using a combination of (1) the built-in ROS 2 tracing instrumentation and (2) custom tracing instrumentation in r2r that mimicks the rclcpp instrumentation.
ros2trace is just the package that provides the ros2 trace command, which allows you to enable tracing and collect trace data from a running application. As far as I understand, you just use ros2 trace as-is when using Ros2TraceAnalyzer, correct?
@wentasah@skoudmar is the tracing instrumentation in r2r exactly the same as the rclcpp instrumentation? Did you have to modify or adapt anything because of the Rust language? If so, it would be interesting to know. @wentasah mentioned at ROSCon that there was an issue with pointer addresses not matching up because of Rust.
Sorry for the confusion. Yes, this tool does not capture a trace but analyzes it.
Before capturing the trace with ros2 trace or with LTTng directly:
For C++, nothing additional is needed. The built-in ROS 2 tracing instrumentation is used. So, it should work out of the box.
For Rust, our fork of r2r is needed, and subscriber and timer callbacks must be passed to new functions Node::subscribe_trace or Timer::on_tick to register them.
Yes, you run ros2 trace to capture the trace and then Ros2TraceAnalyzer to analyze the trace.
No, but we tried to make it as close as possible so it could be opened in Trace Compass or tracetools_analysis.
There are some differences:
We could not use executor tracepoints because instead of an executor, r2r async functions spawned on user-chosen runtime such as tokio.rs.
The r2r’s spin function does not handle callback execution and only notifies the async runtime. The runtime then handles the callbacks. Therefore, we have added new tracepoints to the spin function to allow for the measurement of latency between the notification and execution.
We created the functions Node::subscribe_trace or Timer::on_tick to register callbacks and wrap them in instrumentation because callbacks are not normally passed to r2r.
Specifically for callbacks, we use sequential IDs instead of function pointers to allow function reusing.
We had problems with IDs because r2r initialized rcl objects on the stack, and after initialization, they were moved to stable heap allocation; therefore, the address did not match. In our fork of r2r we initialize them already in the heap.