[REP-2014] RFC - Benchmarking performance in ROS 2

vmayoral · November 7, 2022, 6:20am

My first read of the feedback hinted that there’re some good points in here, but also lots of bias which don’t serve the best interest of the ROS 2 community.

I would encourage everyone to be proactive and constructive while thinking about these matters from a ROS perspective. After all, we’re writing a REP. Try bringing your thoughts to REP-2014 with suggestions that can be easily reviewed so that we can find consensus and advance with the document.

There’s no real argument in here that I can grasp other than “remove tracing”. Against this argument, there’s relevant ^[1]^[2]^[3] (led by community members) prior work which demonstrates how low-overhead tracers are a great fit for robotics and ROS. Also, one must note that the ROS 2 core stack (rmw, rcl, rclcpp) is already instrumented with tracers, so following the same only makes sense.

Finally, note also that tracepoints can be set outside of the “entity under test”. You can set your tracepoints in publisher/subscribers/(other abstraction) outside of you algorithm and use that approach to perform benchmarks. This is similar to the argumentation I’ve been making about using functional data from a system (extracted from the ROS graph) and land that into the trace file for functional performance benchmarking. Both are technically possible and likely, a better choice than using other tools.

My experience benchmarking while developing acceleration kernels and mixing accelerators is that often the case you’ll want additional visibility into the dataflow, and that’s wherein the tracing approach shines as a means to benchmark.

ggrigor:

Unbiased names

Use of terms like “black-box” and “grey-box” make implications (implicit or explicit) that one color is better than another; this propagates a bias and can be construed negatively. Recommend the use of unbiased naming such as opaque, and transparent be used in place of colors.

Opaque performance tests

The requirement that packages be instrumented within the source, prevents a common benchmark from being used where the implementer needs to recompile the source with added probes which could affect the results. There is considerable prior-art in industry benchmarking that no source is required to assess the performance of an entity under performance test.

For example one can evaluate the acceleration performance or fuel efficiency of two cars without looking under the hood using external measurement.

Measurement can be performed at the node level and graph of nodes, by monitoring subscriptions to topics.

A benchmark should be performed as an opaque performance test(s).

I think there’s merit in this input but I’m a bit biased (e.g. we have various accelerators which are opaque such as ROBOTCORE Perception or ROBOTCORE Transform, but then, as hinted in past HAWG meetings, this can easily lead to people over architecting for benchmarks whereas transparency may rule that out) and I’d love benchmarking in alignment with REP-2014), so I’d like to hear more community feedback about it.

I think we could design instrumentation so that both opaque and transparent tests can be made. One way to go about this may lead us to extend the section in REP-2014 as follows:

         Probe      Probe
         +            +
         |            |
+--------|------------|-------+     +-----------------------------+
|        |            |       |     |                             |
|     +--|------------|-+     |     |                             |
|     |  v            v |     |     |        - latency   <--------------+ Probe
|     |                 |     |     |        - throughput<--------------+ Probe
|     |     Function    |     |     |        - memory    <--------------+ Probe
|     |                 |     |     |        - power     <--------------+ Probe
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


          Functional                            Non-functional


+-------------+                     +----------------------------+
| Test App.   |                     |  +-----------------------+ |
|  + +  +  +  |                     |  |    Application        | |
+--|-|--|--|--+---------------+     |  |                   <------------+ Probe
|  | |  |  |                  |     |  +-----------------------+ |
|  v v  v  v                  |     |                            |
|     Probes                  |     |                      <------------+ Probe
|                             |     |                            |
|       System under test     |     |   System under test        |
|                             |     |                      <------------+ Probe
|                             |     |                            |
|                             |     |                            |
+-----------------------------+     +----------------------------+


         Black-Box                            Grey-box



    Probe      Probe     Probe             Probe                     Probe
    +          +          +       +-------+                          |
    |          |          |       |                                  |
+-----------------------------+   | +-----------------------------+  |
|   |          |          |   |   | |                             |  |
|   | +-----------------+ |   |   | |                             |  |
|   | |        v        | |   |   | |                             |  |
|   | |                 | |   |   | |                             |  |
|   +->     Function    +<+   |   +>+                             +<-+
|     |                 |     |     |                             |
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


            Transparent                           Opaque

I like the ideas in here and we should find a way to include them in REP-2014. I don’t think we wan’t to remove the possibility of testing from live, as this seems to be a required feature by some maintainers (e.g. @smac). Favouring consistency of the input data in benchmarks via rosbags makes sense to me.

ggrigor:

Output data check

Performance measurement requires confirmation of work completed during measurement of time spent on the work. When there is no assessment of work completed, optimization can inadvertently, or deliberately lead to improvement with functional errors.

For example we measured an impressive 3x improvement in CPU performance of AprilTag but had to disable our quality check on work completed to measure this result. The improvement resulted in decreased detections which failed the quality check

Benchmark tooling needs to perform a minimal check of work results.

Benchmark parameters

Benchmarks need customizable parameters for the entity(s) under test. Parameters are used for data set size | length, input test data, and publishing rate; when performing throughput testing we need to identify the peak throughput rate, within a specified tolerance of drops in work from the entity (i.e. DDS or node drops).

+1 to these two, but they seem to me a benchmark-implementation aspect more than anything else. Maybe worth mentioning it in the text? I’d be happy to review contributions.

Topic		Replies	Views
ROS 2 benchmark open source release ros2 , wg-acceleration , gpu , humble , benchmarking	6	10045	May 31, 2024
2022 Hardware Acceleration Report in Robotics General ros2 , hardware , fpga , wg-acceleration , gpu	1	834	November 13, 2022
RobotPerf benchmarks, the benchmarking suite to evaluate robotics computing performance using ROS 2 General ros2 , hardware , wg-acceleration , benchmarking , performance	1	968	January 12, 2023
REP-2008 RFC - ROS 2 Hardware Acceleration Architecture and Conventions General ros2 , hardware , fpga , wg-acceleration , gpu	8	5781	October 2, 2021
ROS 2 Hardware Acceleration Working Group 2022 dissemination report and feedback request General ros2 , hardware , fpga , wg-acceleration , gpu	1	778	February 15, 2023

[REP-2014] RFC - Benchmarking performance in ROS 2

Related topics