RobotPerf Benchmarks "alpha" release, preliminary data and conclusions

A few months after introducing the project, our team at Acceleration Robotics is thrilled to announce our released contributions to the RobotPerf benchmark suite, an open, fair, and technology-agnostic robotics computing benchmarking suite. Built on ROS 2, RobotPerf aims to become a consortium of robotics leaders from industry, academia, and research labs, committed to creating unbiased benchmarks that enable robotic architects to compare the performance of various robotics computing components.

Why RobotPerf?

RobotPerf aims to help answer the question: “Which computing platform/s should I use for my new robot”?

With the diverse combinations of robotics hardware and software, there’s a pressing need for an accurate and reliable standard of performance. Our proposal with RobotPerf provides a vendor- and technology-agnostic tangible way for roboticists and technology vendors to evaluate their systems, understand trade-offs between different algorithms, and ultimately build more effective robotics systems.

Our notable contribution to the RobotPerf benchmarking suite highlights the continuing evolution in robotics computing performance. The benchmarking suite, designed to test the robustness of various hardware components such as CPU, GPU, FPGA, and other compute accelerators, brings out the potential versatility and adaptability of robotics systems across diverse robotic computing workloads including perception, localization, control, manipulation, navigation, and more to come.

This suite, importantly, is evaluated with the technology from top silicon vendors in robotics, including AMD, Intel, Qualcomm, and NVIDIA. Our improvements in this arena underpin the robustness of our technology and our engineering team’s commitment to meet the challenge head-on and deliver a truly representative and reproducible performance benchmarking suite. The suite provides reproducible and representative performance evaluations of a robotic system’s computing capabilities.

Two types of performance benchmarking approaches considered

We’ve enabled performance benchmarks in robotic systems through two benchmarking approaches: grey-box and black-box benchmarks.

Black-Box Benchmarking: A Layer-focused Approach

Black-box benchmarking provides a unique approach to performance evaluation by eliminating the layers above the layer-of-interest, replacing these with a specific test application. This layer-focused method gives a distinct view of the performance metrics of the system.

However, the constraints of this approach lie in the necessity to eliminate the “application” for testing the system. Without this, the original application and the test application could interfere, leading to potentially skewed results. This type of benchmarking effectively assesses the performance of individual components within a system without the potential complexities introduced by the broader system.

Grey-Box Benchmarking: Harnessing Internal Insights

On the other hand, Grey-box benchmarking takes a more application-specific approach. It’s capable of observing the internal states of the system, measuring specific points in the system, and thereby generating performance data with minimal interference. This involves instrumenting the entire application, a detailed process that provides a comprehensive view of system performance.

Grey-box benchmarking is powerful for its ability to provide insights that might not be apparent in a purely external evaluation. By instrumenting the complete application, Grey-box benchmarking can yield data that directly ties to the performance of the system in real-world scenarios, resulting in a more thorough and nuanced performance evaluation.

In conclusion, both Black-box and Grey-box benchmarks play pivotal roles in RobotPerf, providing different perspectives on robotics computing performance. These benchmarks, when used effectively, can provide a comprehensive and insightful view of system performance, leading to improvements and enhancements in the field of robotics.

Results

alpha includes preliminary results collected across benchmarks will have the following format:

RobotPerf alpha: a1 latency RobotPerf alpha: a1 power RobotPerf alpha: a1 throughput

All data and results are available in the RobotPerf’s Github benchmark repository. A report containing the alpha results is available here.

Early conclusions drawn from data

We’re still processing the data while we work on the paper but there’re already some interesting observations worth sharing to kick-off the conversation in the community. Find the most interesting ones below (and look forward to the paper if you’re interested in all of them):

Observation 1: Throughput reports can be very misleading when considered in isolation and not representative of the actual performance of a robotic system.

when measured throughput using state-of-the-art CPUs and GPUs (including latest open source hardware accelerators for them) and while benchmarking using black-box ros2_benchmark implementation, we noticed that the throughput (measured in FPS) explodes artificially, and to our understanding, inconsistent to the latency metrics.

Note in the table above how the Throughput metric reported is totally inconsistent with the latencies observed. Intuition hints that a robotics computing pipeline taking an average (mean) of 88 ms to execute its computations will deliver about 10-15 FPS of its output, much less if one considers the more strict max. latency observed (197.53 ms). These 10-15 reasoned FPS is very far from the 220.91 FPS reported by the benchmarks conducted, which hints an issue. That together with the % lost hints that further work must be put into determining the usability of the data being produced out of these pipelines reporting such high FPS values.

Community members should be wary of this, as it is often what’s reported in the performance section of some silicon vendors (e.g. this one), which doesn’t match reality. They use tricks while collecting latency (like discarding max. and min.), accept up to 5% of packages lost and/or don’t report (obscure) latency clearly.

Observation 2: CycloneDDS seems to most robust DDS implementation across the benchmarks tested

Across automated benchmark run in both simple and complex pipelines such as the following graph representative of the a1 perception benchmark,

we found that CycloneDDS outperforms all other tested DDS implementations in terms of stability, low latency and determinism. We pushed a few data points concerning some observations we collected when compared to Fast-DDS (which is default still in ROS 2) encountering the following:

  • CycloneDDS determinism and lower latency is very significant in both workstation and edge/embedded targets.
  • In the case of edge devices, the differences are even more radical, with certain graphs not working at all with Fast-DDS, while they do work with others. For example, with edge targets like the Jetson Nano, the Kria KR260 or the RB5, the differences between Fast-DDS and CycloneDDS are for some reason more abrupt.

Contribute and help out build a better RobotPerf

We are just getting started and there’re certainly many things to improve and data to curate. Help us!
If you’re interested to participate and collaborate producing benchmarks, join us!

Other things you can do to help us:

  • Review the data and help curate it (open an issue and report your findings)
  • Help us analyzing the data
  • Provide input about new benchmarks that you’d like to see

RobotPerf benchmark will be conducting periodic releases, each of which will include latest accepted results. For the RobotPerf result evaluation, each and any participant can provide input to the project using the RobotPerf benchmarks GitHub repository. Once all participants complete their individual testing, the results will be peer-reviewed by a committee group for accuracy and industry-guide.

6 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.