@christophebedard and I are working on an issue with ros2_tracing and we wanted to get some ideas from the community.
Right now, because of how the ROS 2 core is instrumented, we have to start tracing before launching the application to capture meaningful trace data for analysis later (see this issue). We’re aiming to address this limitation by implementing a feature which allow users to start tracing dynamically after the launch of application, as discussed in Find solution to allow starting to trace from any point in time · Issue #44 · ros2/ros2_tracing · GitHub.
Here’s the main use case we’re trying to support:
The user starts an application without starting to trace before the launch of application. Eventually they decide to start tracing either to record upcoming events or they notice something went wrong during the execution. Detailed description can be found in this comment.
We’d really appreciate your input:
Do you encounter this kind of use case in your workflow?
Are there other situations where post-launch tracing would be helpful?
Any suggestions or feedback on how this feature should be implemented?
Any suggestions or feedback on how this feature should be implemented?
I think it is important that however any new tracing affordances/features are added, they do not impede a solution to rclcpp#2860 that makes it possible to opt out at runtime of lttng-ust behavior in shipped ROS 2 binaries. Unconditionally starting threads at dl load time is bad library behavior, and lttng-ust specifically documents linking it into applications, not libraries, because doing so in libraries forces its problematic behavior on all users. (Before “just build from source if you want to avoid it” gets brought up, this shouldn’t excuse including problematic behavior in the main ROS 2 binaries when solutions like weak symbols and/or runtime loading exist.)
I think it’s related and will of course be considered when picking the eventual solution: overhead, UX, etc. I’ve been thinking of solutions to this for years now, and I’m definitely aware of the “cost” that this mechanism could have when it’s not used or needed.
Let’s discuss the issue you mentioned in the actual issue.
I used the ros2_tracing framework in humble a bit in the past in conjunction with custom tracepoints for performance measurements in the Kria Robotics Stack (KRS) for FPGAs.
Things that I noted that might be interesting/relevant here:
the key, when evaluating the traces, was linking them to the metadata based on their IDs, which as of know, are just random memory pointer generated at start - a unique, but final ID for each subscription,.. would allow to generate the metadata once per “code modification” and simply load it from a stored location opposed to regenerating it at each start. I mainly used profiling once I was sure the application behaves as expected. Of course, this process was cyclic, but the profiling always came as one of the last steps where not a lot of code changes happened anymore.
depending on the use case, a relatively lightweight solution targeted at a specific group of nodes could involve Components. This might not allow to investigate all means like executor performance,.. but provides a nice, rosified integration to enable/disable tracing. I am specifically referring here to something similar like the Adaptive Component of ROS Acceleration