Message Flow Analysis for ROS Through Tracing

As part of an undergraduate research project, I used Trace Compass, a trace viewer and analysis framework, to leverage existing ROS instrumentation and create an analysis that can draw the path of a message through ROS nodes. It can show how much time a message spent inside queues and callbacks.

I’ve written a blog post to present my project and give more context on tracing & robotics, explain how everything works, and to show off the resulting analysis.

Right now it’s mostly a proof-of-concept, but it highlighted a few paths that could be explored in the future.

Let me know if you have any questions or comments!

7 Likes

I’m very curious!
It looks like you are building for ROS1, but are you working on a project for ROS2 to analyze with Trace Compass as well?

That’s right, this project was for ROS 1 only.

I am going to be working on something along those lines for ROS 2 (with ros2_tracing :grin:) in the coming year as part of my master’s, but the exact details are TBD. I’ll definitely post the (eventual) results here on Discourse!

Thanks for the quick answer!

I’m trying to use ros2_tracing and tracetools_analysis :wink:
I see that you parsed the CTF data with babeltrace and visualized it with jupyter.

It looks to me that Trace Compass is better suited for detail analysis, as the OS-level task visualization is already in place.
What do you have any intention to have switched from Trace compass to jupyter?

The first goal with ros2_tracing/tracetools_analysis was to extract “raw” data as opposed to higher-level information (i.e. states analyses). We also wanted to allow people to quickly write their own scripts, so that’s why we chose Jupyter for the visualization.

You are right though, Trace Compass provides a lot of other useful information! It’s moving from Eclipse to a web-based IDE (Theia), which I know will make it more appealing to a lot of people, so I’ll probably be looking into that.

1 Like

I’ve never heard of Theia before! It looks good!

I’m currently considering ways to measure and calculate Node latency and end-to-end latency as well as callback duration.
I’m still in the process of implementing a PoC, so it’s not something I can contribute to right away, but I’m imagining it in the form of a python library now. (It’ll include fixing or adding trace points.)
I’m hoping to eventually merge it into tracetools_analysis, but I’d like to share the design documentation when it’s ready.

In addition, if there is some kind of roadmap for tracetools, I’d like to know about it.

Looking forward to it! If you need help or want to share your progress, just open an issue/MR on the relevant repo.

We have some ideas about the processing/analysis part, but nothing formal. As for tracepoints/instrumentation, there’s a lot of possible additions. So far we’ve mostly worked on our own needs along with the needs other people have expressed (e.g. lifecycle nodes!). It might be a good idea to have a discussion about this as part of the RTWG to see what needs people have and what they’d like to see added to ros2_tracing/tracetools_analysis. I’ll bring it up in January!

2 Likes