I’m considering how to measure the performance of ROS2 applications.
I’m still in the middle of this project, but I would appreciate any comments.
Goal
measuring latency and identifying the cause.
Latency measurement
I’d like to measure
- callback duration
- communication latency
- node-latency
- end-to-end latency
in a real application.
For example, I’d like to create the following diagram.
Visualizing Behavior
ROS2 includes communication and scheduling, and I think it may behave differently from the developer’s intention, resulting in poor performance.
For example, communication QoS, callback group, callback priority.
So, I’d like to create the following diagram.
In terms of content, the following article does something similar.
WIP
I think the following trace points are mainly necessary for measurement.
Since I’m still experimenting, I’m mainly using LD_PRELOAD hooks to add trace points.
However, ROS2 has some header implementations that cannot be hooked.
To add trace points to headers, I also add trace points on the forked rclcpp and ros2_tracing.
In the future, I’m considering proposing new trace points to ROS if the measurements seem useful.
problem
I’m currently trying to use FastDDS and CycloneDDS for measurements, but on_data_available is not working as expected.
on_data_available is not used in rmw_cyclonedds, and there may be other better places to insert tracepoints.
I think it’s necessary if we want to evaluate pure (not dependent on scheduling communication latency though.
I don’t know much about the DDS and rmw, so I would be glad if you could comment on this as well.