We are releasing benchmark tooling for ROS 2 which provides performance measurement of graphs of nodes in open source. This tooling allows for realistic assessments of robotics applications under load including message transport costs in RCL for practical benchmarking indicative of your real-world performance. This tooling does not require modification of nodes to measure results, and standardizes input rosbag datasets for independent verification of benchmark results.
ros2_benchmark for Humble is available now at github.com/NVIDIA-ISAAC-ROS/ros2_benchmark.
r2b dataset 2023 for standard input data to benchmarking is available for download from NGC.
The ros2_benchmark uses industry best practices and is professionally hardened for throughput and latency measurement of graphs of nodes in real-time robotics applications including:
- Dependable results - automated performance measurements are performed for multiple seconds N times (default N = 5), discarding min and max results to reduce variability. Benchmark results are reported in log files for import into your visualization tool of choice.
- Input Dataset - available for download from NGC under CCv4 Attribution License the r2b dataset 2023 provides a consistent input to the graph from rosbag. Additional input data can be added when needed.
- Input image resolutions - with a broad range of computing hardware available, image processing is performed at different resolutions depending on the robotics application.
- Input & output transport time - Time spent in RCL publishing and receiving messages for inter and intra process is included in the measurement results; this accurately represents what can be expected in a robotics application and avoids inflated results that remove message passing costs.
- Input & output type adaptation - input data is injected using standard ROS types, or using type adaptation and type negotiation.
- Benchmark parameters - the parameters used for testing including data input length, publishing rate, input size and can be customized with a configuration file.
- Throughput auto finder - to measure peak throughput of the graph, with <1% topic drops, requires automatically finding the peak throughput of the graph. The throughout auto finder efficiently finds the input data publishing rate for peak throughput.
- Real-time latency - fixed topic publisher rate for real-time measure of latency. This shows what is provided to the real-time system at the target fixed rate whereas throughput shows what peak performance is possible for the robotics application.
- Cloud native - measurements can be performed on Kubernetes as part of automated testing or CI|CD nightly testing as part of modern software development. Measurements can also be performed on local developer systems.
- Opaque testing - graphs of nodes are tested as binaries with all performance measurement tooling in benchmark directly and does not modify code in the graph under test. This enables performance measurement inclusive of open source to proprietary solutions with the same tools in an unobtrusive way.
- Transparency - results in JSON include the parameters used to run the benchmark, including a MD5 of the input rosbag for independent and verification of results.
In development of Humble type adaptation (REP-2007) and type negotiation (REP-2009) to improve performance of graphs of ROS 2 nodes we needed a way to consistently measure performance and improvements. Prior benchmarks for ROS focused on measuring the performance of publishing and receiving topics in the same process, or across processes using the DDS. We needed to measure the performance of graphs of nodes including the time spent publishing topics and computation within the graph in an automated, cloud-native, consistent way to best represent what can be expected in a robotics application. ros2_benchmark was created to provide this.
r2b dataset of sensor data captured compressed to rosbag in multiple scenarios are provided for input data to benchmarking. This includes high precision time-synchronized sensor data from 3D LIDAR, stereo cameras and IMUs.
ros2_benchmark is our professional grade tool to measure all of the results we publish for Isaac ROS. This is part of our CI|CD process to measure performance nightly, and catch regressions; we run this tool on 7 platforms using aarch64 and x86_64 architectures across hundreds of graph configurations. The tool has been run tens of thousands of times in automated cloud native testing for over a year, and on developer systems.
A sample subset output JSON log for Stereo Image Proc node (CPU-only) running on Jetson AGX Orin is provided below.
{ "BasicPerformanceMetrics.RECEIVED_DURATION": 4981.019205729167, "BasicPerformanceMetrics.MEAN_PLAYBACK_FRAME_RATE": 66.44940807620169, "BasicPerformanceMetrics.MEAN_FRAME_RATE": 66.45226335469312, "BasicPerformanceMetrics.NUM_MISSED_FRAMES": 0.0, "BasicPerformanceMetrics.NUM_FRAMES_SENT": 331.0, "BasicPerformanceMetrics.FIRST_SENT_RECEIVED_LATENCY": 14.696207682291666, "BasicPerformanceMetrics.LAST_SENT_RECEIVED_LATENCY": 14.484619140625, "BasicPerformanceMetrics.MAX_JITTER": 2.924560546875, "BasicPerformanceMetrics.MIN_JITTER": 0.000244140625, "BasicPerformanceMetrics.MEAN_JITTER": 0.2223148983662614, "BasicPerformanceMetrics.STD_DEV_JITTER": 0.28068860820818814, "CPUProfilingMetrics.MAX_CPU_UTIL": 27.633333333333336, "CPUProfilingMetrics.MIN_CPU_UTIL": 25.233333333333334, "CPUProfilingMetrics.MEAN_CPU_UTIL": 25.897777777777776, "CPUProfilingMetrics.STD_DEV_CPU_UTIL": 0.6463663508576992, "CPUProfilingMetrics.BASELINE_CPU_UTIL": 26.008333333333336,“custom”: {
“data_resolution”: “Quarter HD (960,540)”
},
}
Clone the repositories you need into your ROS workspace to build from source with colcon alongside your other ROS 2 packages to use ros2_benchmark and better understand the performance of graphs of nodes in your ROS 2 applications.
Thanks