[REP-2014] RFC - Benchmarking performance in ROS 2

vmayoral · October 13, 2022, 9:45pm

Hey all,

I’d like to bring everyone’s attention to a new REP ready to be reviewed and receive input from the wider ROS community: [REP-2014] Benchmarking performance in ROS 2. This proposal is inspired and builds on top of the work of many others in this community and proposes a reference benchmarking approach for ROS 2 systems that is already being used in real scenarios including perception and mapping ^[1], hardware acceleration ^[2] ^[3] or self-driving mobility ^[4].

Sharing the motivation section below but encouraging everyone to read the full draft in the PR:

Benchmarking is the act of running a computer program to assess its relative performance. In the context of ROS 2, performance information can help robotists design more efficient robotic systems and select the right hardware for their robotic application. It can also help understand the trade-offs between different algorithms that implement the same capability, and help them choose the best approach for their use case. Performance data can also be used to compare different versions of ROS 2 and to identify regressions. Finally, performance information can be used to help prioritize future development efforts.

The myriad combinations of robot hardware and robotics software make assessing robotic-system performance in an architecture-neutral, representative, and reproducible manner challenging. This REP attempts to provide some guidelines to help robotists benchmark their systems in a consistent and reproducible manner by following a quantitative approach. This REP also provides a set of tools and examples to help guide robotists while collecting and reporting performance data.

Value for stakeholders:

Package maintainers can use these guidelines to integrate performance benchmarking data in their packages.

Consumers can use the guidelines in the REP to benchmark ROS Nodes and Graphs in an architecture-neutral, representative, and reproducible manner, as well as the corresponding performance data offered in ROS packages to set expectations on the capabilities of each.

Hardware vendors and robot manufacturers can use these guidelines to show evidence of the performance of their systems solutions with ROS in an architecture-neutral, representative, and reproducible manner.

Reviews, comments and thoughts are very welcome.

Lajoie, Pierre-Yves, Christophe Bédard, and Giovanni Beltrame. “Analyze, Debug, Optimize: Real-Time Tracing for Perception and Mapping Systems in ROS 2.” arXiv preprint arXiv:2204.11778 (2022). ↩︎
Mayoral-Vilches, V., Neuman, S. M., Plancher, B., & Reddi, V. J. (2022). “RobotCore: An Open Architecture for Hardware Acceleration in ROS 2”.
https://arxiv.org/pdf/2205.03929.pdf ↩︎
Mayoral-Vilches, V. (2021). “Kria Robotics Stack”.
https://www.xilinx.com/content/dam/xilinx/support/documentation/white_papers/wp540-kria-robotics-stack.pdf ↩︎
Li, Zihang, Atsushi Hasegawa, and Takuya Azumi. “Autoware_Perf: A tracing and performance analysis framework for ROS 2 applications.” Journal of Systems Architecture 123 (2022): 102341. ↩︎

ggrigor · November 3, 2022, 3:59pm

Good to see this proposal. We have feedback on the draft with the intent of making this proposal more broadly applicable to benchmarking hardware acceleration.

Tracing into a separate REP

REP-2014 should remove tracing as it is independent of benchmarking. Adding probes alters the entity under test, and should not be used in benchmark testing the entity. It is very useful for triage, debug, and optimization for developers to make changes to benchmark results, but is not required for objective benchmark(s).

Hence feedback is focused on benchmarking, not tracing.

Unbiased names

Use of terms like “black-box” and “grey-box” make implications (implicit or explicit) that one color is better than another; this propagates a bias and can be construed negatively. Recommend the use of unbiased naming such as opaque, and transparent be used in place of colors.

Opaque performance tests

The requirement that packages be instrumented within the source, prevents a common benchmark from being used where the implementer needs to recompile the source with added probes which could affect the results. There is considerable prior-art in industry benchmarking that no source is required to assess the performance of an entity under performance test.

For example one can evaluate the acceleration performance or fuel efficiency of two cars without looking under the hood using external measurement.

Measurement can be performed at the node level and graph of nodes, by monitoring subscriptions to topics.

A benchmark should be performed as an opaque performance test(s).

Input data

Performance measurement requires the system to perform some function, which requires input data to operate upon. For some functions input data can be repeated and for others input data need to be sequential.

To benchmark a function, input data needs to be provided with a data loader containing real or synthetic data from a file, rosbag, or directly from the sensor (live).

Without consistency on the input data used, any benchmark measurements performed independently cannot be compared, thus the performance measured is only applicable to the person measuring it, as it’s not reproducible by others.

Output data check

Performance measurement requires confirmation of work completed during measurement of time spent on the work. When there is no assessment of work completed, optimization can inadvertently, or deliberately lead to improvement with functional errors.

For example we measured an impressive 3x improvement in CPU performance of AprilTag but had to disable our quality check on work completed to measure this result. The improvement resulted in decreased detections which failed the quality check

Benchmark tooling needs to perform a minimal check of work results.

Benchmark parameters

Benchmarks need customizable parameters for the entity(s) under test. Parameters are used for data set size | length, input test data, and publishing rate; when performing throughput testing we need to identify the peak throughput rate, within a specified tolerance of drops in work from the entity (i.e. DDS or node drops).

Remove interpretation of results

Performance needs to be performed by highly trusted scientific devices providing objective measurement(s). The analysis of results should be left to those making decisions from the measurements.

The proposal for the REP makes “Performance metrics in robotics” statements around how to interpret results. While these statements may be provided with good intentions as examples, they should not be part of the REP as they contain inherent bias on how to interpret results. Guidance on interpretation can be provided separately on how to perform analysis of measurements.

The introduction to the REP can include justification on the value of having objective measurements for those performing analysis with their own criteria, without results interpretation or conclusions to draw from objective measurements.

All references and examples of results interpretation should be removed, so this is a trusted objective measurement, to avoid having a biased system.

In summary there are several issues to consider and address in REP-2014

Remove tracing from this REP as it’s independent of benchmarking
Use unbiased names
Opaque performance tests
Input data loader and data
Output data monitor and checker
Benchmark parameters
Remove interpretation of results

We run ~200 benchmarks nightly in cloud native Kubernetes systems to provide objective performance measurements, across 4 different compute platforms, including 2 different instruction set architectures for nodes and graphs of nodes in ROS 2. These measurements are used to analyze our work on hardware acceleration. To replace what we use in practice to provide great hardware acceleration into ROS 2, we would like us all to use a level playing field and address the issues above.

Thank you

vmayoral · November 4, 2022, 7:44am

A small update in here, I’ve gone ahead and processed all previous comments in the REP PR and provided in the initial draft of REP-2014. Thanks everyone for the valuable feedback. We also discussed 2022-11-03T17:00:00Z this same document in the recent HAWG#12 meeting and obtained no objections nor further comments about its status.

Thanks for the feedback @ggrigor, I think there’re some sections above which definitely add value. Do you think you could bring this into the REP-2014 PR constructively? So that taking further input becomes manageable, for everyone else, further input and contributions can be sent to https://github.com/ros-infrastructure/rep/pull/364 as suggestions. This way can process input easily and discuss it with the rest of the group.

I believe the document is in good shape to move forward and be reviewed by the TSC for comments/approval-vote. This REP is of Informational Type so the process to formalize it might be a bit different. @Katherine_Scott, @clalancette and Open Robotics team, how can we move forward with this?

clalancette · November 4, 2022, 12:34pm

At the moment it looks like there is still open discussion about what to include and not include into the document. Because of that, there isn’t much to do here except for continue iterating with the community and making changes to the PR. Once that calms down we can talk about reviewing it and getting it in.

Ingo_Lutkebohle · November 6, 2022, 1:11pm

Maybe this is a terminological issue, so let me explain my understanding of this term, which I believe to be the standard one, and we can see if that’s actually the issue, or if you meant something else.

As a technical term, “tracing” encompasses anything that logs non-functional data about a program’s execution. When the kernel logs data about a process’s CPU usage, this is also tracing. In this case, it is also altering the “entity under test” by taking some time inside the kernel to read out the corresponding CPU register and storing it internally (which is why this kind of statistics can be disabled there). And when a separate program reads /proc to get at this data, this is also altering the entity under test by taking a context switch and some time.

Tracing does not necessarily impact in this way, however, – there are forms of tracing, such as the Program Trace Macrocell supported by many ARM processors, which can be used by an external agent to record data about the program under test in a way which I am told (having never used this myself) to have no impact on the software.

I am not aware of any way to record performance-related data without tracing in this sense. If you have a specific one in mind, it would help if you could share it!

All that said, I surmise that you might be objecting to “tracing” in a more narrow sense, i.e. as referring to things such as ros2_tracing, or LTTng, or ebpf, or perf, or similar tools. I am not the original author of the relevant sentences, but at least I have not understood tracing to be restricted to a specific set of tools. If you feel that some sentences make an implication in that direction, I am sure this can be fixed. Maybe you could make a concrete suggestion.

Finally, I would also like to make a case for mentioning some tracing tools (as examples): There are much worse means for measurement still in use, both regarding potential impact to the measured system, and regarding accuracy. In contrast, the software tracing tools we have linked to in then REP have been carefully optimized and their overheads have been quantified.

Good point. So we would use “opaque” instead of black, and “transparent” instead of white, right? What is an appropriate replacement for “grey-box” then? Semi-transparent?

Yeah, I was also meaning to add statements to that effect (I started a section on reproducibility on Friday, so far only mentioning to remove unintended interferences as much as possible).

That said, I’ve also been wondering whether REP2014 might be turned into a more general “performance measurement” REP, rather than be restricted to benchmarking. Many things are common across those two areas, but the restriction to have an a-priori known input set is not one of them. For example, I often measure overhead, and while doing so I record some data about the kind, number and ordering of input data to compare, but not the exact data (this makes it easy to turn this and off during operation, and also reduces the impact on the system).

I confess to not really understanding what you mean by this sentence and actually also not the remainder of this section. In particular I’m wondering what you mean by “they contain inherent bias on how to interpret results”. Can you explain this a bit?

Firstly, I’m not really seeing guidance on interpretation in that section. The section contains example metrics which have been commonly used in existing benchmarks. If we think there is a problem with those, a REP would be a great way of removing those problems. Please share your concerns!

Secondly, we could of course split into yet another REP, but there is also a danger in having more documents than necessary, because it confuses people, and of course, the ensuing discussion might actually have an impact on this REP. If we think that it is useful to have examples and guidance on interpretation, I suggest we do it together with how to measure.

btw, there is no particular hurry here – we shouldn’t take forever, but we also don’t have to force early agreement by postponing potentially contentious issues. If we can integrate this now, the document will be better for it.

christophebedard · November 6, 2022, 11:56pm

I do agree that a lot of – or most – benchmarks only record the input and the output (i.e., they perform “opaque” tests), but why do we have to limit ourselves to that? This seems a bit pointless to me.

I think the intention here is to do more than just an “opaque” performance test. There is also plenty of existing work that does this, and our own work with ros2_tracing demonstrates why it is valuable. Finally, you can still use a low-overhead tracer to record input/output data and benefit from the minimal runtime performance impact.

vmayoral · November 7, 2022, 6:20am

My first read of the feedback hinted that there’re some good points in here, but also lots of bias which don’t serve the best interest of the ROS 2 community.

I would encourage everyone to be proactive and constructive while thinking about these matters from a ROS perspective. After all, we’re writing a REP. Try bringing your thoughts to REP-2014 with suggestions that can be easily reviewed so that we can find consensus and advance with the document.

There’s no real argument in here that I can grasp other than “remove tracing”. Against this argument, there’s relevant ^[1]^[2]^[3] (led by community members) prior work which demonstrates how low-overhead tracers are a great fit for robotics and ROS. Also, one must note that the ROS 2 core stack (rmw, rcl, rclcpp) is already instrumented with tracers, so following the same only makes sense.

Finally, note also that tracepoints can be set outside of the “entity under test”. You can set your tracepoints in publisher/subscribers/(other abstraction) outside of you algorithm and use that approach to perform benchmarks. This is similar to the argumentation I’ve been making about using functional data from a system (extracted from the ROS graph) and land that into the trace file for functional performance benchmarking. Both are technically possible and likely, a better choice than using other tools.

My experience benchmarking while developing acceleration kernels and mixing accelerators is that often the case you’ll want additional visibility into the dataflow, and that’s wherein the tracing approach shines as a means to benchmark.

ggrigor:

Unbiased names

Use of terms like “black-box” and “grey-box” make implications (implicit or explicit) that one color is better than another; this propagates a bias and can be construed negatively. Recommend the use of unbiased naming such as opaque, and transparent be used in place of colors.

Opaque performance tests

The requirement that packages be instrumented within the source, prevents a common benchmark from being used where the implementer needs to recompile the source with added probes which could affect the results. There is considerable prior-art in industry benchmarking that no source is required to assess the performance of an entity under performance test.

For example one can evaluate the acceleration performance or fuel efficiency of two cars without looking under the hood using external measurement.

Measurement can be performed at the node level and graph of nodes, by monitoring subscriptions to topics.

A benchmark should be performed as an opaque performance test(s).

I think there’s merit in this input but I’m a bit biased (e.g. we have various accelerators which are opaque such as ROBOTCORE Perception or ROBOTCORE Transform, but then, as hinted in past HAWG meetings, this can easily lead to people over architecting for benchmarks whereas transparency may rule that out) and I’d love benchmarking in alignment with REP-2014), so I’d like to hear more community feedback about it.

I think we could design instrumentation so that both opaque and transparent tests can be made. One way to go about this may lead us to extend the section in REP-2014 as follows:

         Probe      Probe
         +            +
         |            |
+--------|------------|-------+     +-----------------------------+
|        |            |       |     |                             |
|     +--|------------|-+     |     |                             |
|     |  v            v |     |     |        - latency   <--------------+ Probe
|     |                 |     |     |        - throughput<--------------+ Probe
|     |     Function    |     |     |        - memory    <--------------+ Probe
|     |                 |     |     |        - power     <--------------+ Probe
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


          Functional                            Non-functional


+-------------+                     +----------------------------+
| Test App.   |                     |  +-----------------------+ |
|  + +  +  +  |                     |  |    Application        | |
+--|-|--|--|--+---------------+     |  |                   <------------+ Probe
|  | |  |  |                  |     |  +-----------------------+ |
|  v v  v  v                  |     |                            |
|     Probes                  |     |                      <------------+ Probe
|                             |     |                            |
|       System under test     |     |   System under test        |
|                             |     |                      <------------+ Probe
|                             |     |                            |
|                             |     |                            |
+-----------------------------+     +----------------------------+


         Black-Box                            Grey-box



    Probe      Probe     Probe             Probe                     Probe
    +          +          +       +-------+                          |
    |          |          |       |                                  |
+-----------------------------+   | +-----------------------------+  |
|   |          |          |   |   | |                             |  |
|   | +-----------------+ |   |   | |                             |  |
|   | |        v        | |   |   | |                             |  |
|   | |                 | |   |   | |                             |  |
|   +->     Function    +<+   |   +>+                             +<-+
|     |                 |     |     |                             |
|     +-----------------+     |     |                             |
|      System under test      |     |       System under test     |
+-----------------------------+     +-----------------------------+


            Transparent                           Opaque

I like the ideas in here and we should find a way to include them in REP-2014. I don’t think we wan’t to remove the possibility of testing from live, as this seems to be a required feature by some maintainers (e.g. @smac). Favouring consistency of the input data in benchmarks via rosbags makes sense to me.

ggrigor:

Output data check

Performance measurement requires confirmation of work completed during measurement of time spent on the work. When there is no assessment of work completed, optimization can inadvertently, or deliberately lead to improvement with functional errors.

For example we measured an impressive 3x improvement in CPU performance of AprilTag but had to disable our quality check on work completed to measure this result. The improvement resulted in decreased detections which failed the quality check

Benchmark tooling needs to perform a minimal check of work results.

Benchmark parameters

Benchmarks need customizable parameters for the entity(s) under test. Parameters are used for data set size | length, input test data, and publishing rate; when performing throughput testing we need to identify the peak throughput rate, within a specified tolerance of drops in work from the entity (i.e. DDS or node drops).

+1 to these two, but they seem to me a benchmark-implementation aspect more than anything else. Maybe worth mentioning it in the text? I’d be happy to review contributions.

ggrigor · November 11, 2022, 7:27pm

You are correct. Feedback on tracing was more in a narrow sense similar to profiling. It’s a good catch that needs clarification.

For benchmarking to be informative for decision making, it strives to match closely how the software under test would be used in practice. Measurements performed in the benchmark should use the same software and hardware that would be used in practice; this implies we do not re-compile the platform, or software under test to perform the benchmark.

This is why we propose benchmarking and the narrow sense of tracing be separate.

I’ve not run into a need to use anything other than opaque or transparent test naming but semi-transparent seems to fit.

This might address confusion.

Benchmarking uses known inputs and measures results / output. Instrumenting software to understand how it’s performing, to investigate bottlenecks, and improve performance we see as independent. Benchmarks are one of several ways to do performance measurement.

The feedback is in reference to:

For example, a robotic system may be able to perform a task in a short amount of time (low latency), but it may not be able to do it in real-time. In this case, the system would be considered to be non-real-time given the time deadlines imposed. On the other hand, a robotic system may be able to perform a task in real-time, but it may not be able to do it in a short amount of time. In this case, the system would be considered to be non-interactive. Finally, a robotic system may be able to perform a task in real-time and in a short amount of time, but it may consume a lot of power. In this case, the system would be considered to be non-energy-efficient.

Which should be reworded to remove interpretation as:

For example, a robotic system may be able to perform a task in a short amount of time (low latency), but it may not be able to do it in real-time. On the other hand, a robotic system may be able to perform a task in real-time, but it may not be able to do it in a short amount of time. Finally, a robotic system may be able to perform a task in real-time and in a short amount of time, but it may consume a lot of power. These are all tradeoffs which need to be considered when reviewing performance results.

And

In another example, a robotic system that can perform a task in 1 second with a power consumption of 2W is twice as fast (latency) as another robotic system that can perform the same task in 2 seconds with a power consumption of 0.5W. However, the second robotic system is twice as efficient as the first one. In this case, the solution that requires less power would be the best option from an energy efficiency perspective (with a higher performance-per-watt). Similarly, a robotic system that has a high bandwidth but consumes a lot of energy might not be the best option for a mobile robot that must operate for a long time on a battery.

In another example, a robotic system configuration that can perform a task in 1 second with a power consumption of 2W is twice as fast as robotic system configuration that can perform the same task in 2 seconds with a power consumption of 0.5W. However, the second robotic system is twice as efficient as the first provided both meet the real-time requirement.

Great question. A few reasons.

Modifying the software under test to benchmark it raises the question if modifications are done in an apples-to-apples way, when comparing results between implementations. This question/doubt is eliminated when the software under test is unmodified to benchmark it.

Benchmarking is of value for comparisons where source is not available. The audience for the benchmark becomes more limited by requiring a source.

Benchmarking strives to match closely how software under test is used in practice. The version of software under test should be the same as what is used in practice, and be available for anyone to do an audit of results. While LTTng and many other tools have minimally invasive near zero performance cost and impact on software, our software used in practice does not ship with this enabled by default as the performance cost is not zero. I suspect this is the case for others.

There is great value in guiding developers on how to trace and profile code to improve, and optimize it.

ros2_tracing has the limitation that it is Linux specific, which excluded other RTOS, Windows and QNX platforms. How does the REP make this not Linux only?

REP-2014 implies that ros2_tracing can measure the time it takes for moving a task from a CPU to an accelerator in a serial way (i.e. replace CPU task A, with accelerated task A’). Our work on high performance computing strives to provide parallel heterogeneous hardware to increase performance, and reduce latency by allowing multiple operations to be performed in parallel. Unlike the CPU to accelerator approach, we do not wait on the CPU for work to complete until necessary, as visualized below.
ros2_humble_typeadaptation

ros2_tracing does not provide an understanding of the interaction between different heterogeneous hardware performing work. Various vendors have different tools to profile and measure their hardware. ros2_tracing solves for a CPU running Linux, but isn’t solving for understanding high performance parallel computing. Is solving this out of scope for the REP?

Since REP-2014 is informative only, perhaps the input and output data, and checking of output data is out of scope. It doesn’t seem to be relevant unless there is an actual benchmark test provided. Seems this might be better and more broad as “performance” instead of “benchmarking” as mentioned above.

Thanks.

debjit · November 11, 2022, 8:32pm

FWIW, here’s an article on benchmarking ROS2 that was published by RTI March 2021. I’ve used RTI’s Connext DDS, it’s solid.

Ingo_Lutkebohle · November 12, 2022, 3:49pm

@ggrigor, thank you for the clarifications. I think for your bias-related suggestions, and for the comments regarding checking of work we have agreement. We can integrate changes to that effect, or if you’d care to submit comments on the PR, that would be most welcome.

Firstly, I still see no sentence in the REP that requires use of ros2_tracing. If you interpret this otherwise, please let us know (like as a comment on the PR).

Secondly, while the current implementation of ros2_tracing has some Linux-specific parts in it, the architecture of ros2_tracing can certainly integrate other trace data sources. We have specifically foreseen this, because in the micro-ROS project it was originally intended to also integrate data from ARM trace macro cells. This didn’t materialize, unfortunately, but it would be doable.

If NVidia would be interested in integrating with ros2_tracing, I’m sure we could come up with something. It probably wouldn’t be a full replacement, but we might achieve something sufficient for generalize benchmarks.

Agreed, but I am sure you would also acknowledge that in practice, many vendors try to gain advantage by quite deep optimizations. For example, in high-performance computing benchmarks, it used to be pretty common to use special optimizing compilers and things like hand-optimized math libraries. This doesn’t modify the source-code of the benchmark as such, but it for sure modifies what is being executed.

When such things are seen as acceptable, I have to wonder whether putting a few tracepoints in really constitutes a problem that has to be ruled out specifically. Again, nobody requires you to do it.

vmayoral · November 13, 2022, 9:28am

Thanks everyone for the feedback provided in here so far . We’ll be discussing the input in the upcoming Hardware Acceleration WG, meeting #13 (LinkedIn event). I’ll particularly block a big chunk of the meeting for it and prepare a summary of most relevan items discussed above. We’ll go through each one of them to collect the group’s input. Bring questions and/or additional thoughts to the meeting please!

A few remarks from my side from the discussion above:

I believe there’s no need to relax those sentences and I heard nobody except you that on a first read felt that way (in fact it took a further clarification for us to follow your argument), thereby I’d like to hear more feedback about this. The message being conveyed is important to educate the reader and stresses the principle that robots are deterministic machines and their performance should be understood by considering various metrics". In my view performance in robotics does not equal throughput (or any other metric in an isolated manner) and this document should instruct how to benchmark performance in robotics. This is also specially important to educate roboticists about compute architectures and hardware acceleration (i.e. there’s no single accelerator that solves all cases and things need to properly assessed).

This is a wrong understanding of ros2_tracing and to our group’s experience, the project can be easily extended to support other tracing frameworks (you just need to make sure to meet CTF if you wish to merge/mix traces). The fact that it currently only supports LTTng is due to limited resources (and us all not contributing enough). Note that instrumentation is defined through a series of headers and preprocessor directives which allow you to abstract aways OS, frameworks, etc. Have a look at how we instrumented image_pipeline ROS 2 package for reference.

QNX’s SAT is what you’re be looking for and it can be easily enabled in ros2_tracing (that said, I hear some people’s using LTTng in QNX). We’re also working with Microsoft’s folks to try and align REP-2014 with Windows ROS 2 deployments.

I think this argument isn’t valid. And again, ROS 2 is already instrumented with ros2_tracing for a reason. Let’s not reinvent the wheel for business’ interests.

This is also wrong (and very much!). Though we can’t claim holistic support between all different heterogeneous hardware (there’s no such a thing, unfortunately) ros2_tracing can be extended and used to provide an understanding of the interaction between different heterogeneous hardware. The lowest hanging fruit is leveraging LTTng-HSA, which allows using easily ros2_tracing on AMD GPUs. We’re are also in the process of extending ROBOTCORE Framework (which implements REP-2009, among other things) to support tracing across various accelerators:

Nevertheless, what makes me confused is how we’re twisting the argument in here. Above, you claimed repeatedly that we had to focus on benchmarking at the input/output of test subjects and we discussed how ros2_tracing can do that perfectly fine and more efficiently than other mechanisms. Now you seem to care about introspection and try to discard it for that reason (which is really hard to argue with the amount of research supporting ros2_tracing and LTTng).

@debjit it’d be great if you, RTI and other DDS vendors could have a look at REP-2014, share feedback and try using it down the road for benchmarking performance of DDS. We all know about the situation that happened not that long ago while comparing open source DDS implementations.
REP-2014 can help address future issues in this direction.

Ingo_Lutkebohle · November 14, 2022, 8:11am

I actually would like to support @ggrigor on this one, and I’m a bit confused by your comments. The suggestions he made do not argue against using metrics, or for only using throughput. In fact, I think they just take the existing content and make it clearer and more to the point, by removing some side-statements which only detract from the main point being made.

vmayoral · November 14, 2022, 9:39am

I didn’t say so. I just wanted to clarify that reporting on throughput “only” is not enough. This practice is currently being followed and I find it misleading.

In my view the current text reads well but I’ll take note of your preference, thanks for the input.

christophebedard · November 14, 2022, 10:09pm

@iluetkeb and @vmayoral already addressed this, but I wanted to emphasize these points:

I definitely do not think that REP 2014 is ros2_tracing, nor do I think that ros2_tracing is REP 2014.
ros2_tracing isn’t LTTng: as mentioned in the paper (Section IV-A), ros2_tracing is not strictly LTTng/Linux-specific. Sure, LTTng is the only tracer that is currently supported, but contributions that add support for other tracers are welcome!

As you said, benchmarking or analyzing the performance of a distributed, parallel, heterogeneous (etc.) system necessarily implies that a few underlying tools will be used. I don’t think we claim that we should use a single tool and that this tool should be ros2_tracing (or LTTng). As @vmayoral mentioned, some of my former research lab colleagues have done a lot of work on tracing AMD GPUs (e.g., recently using ROCm; see this presentation, this other presentation, and this demo). By combining execution data from all relevant sources (e.g., LTTng for Linux userspace & kernel, ROCm for AMD GPUs, ETW for Windows userspace & kernel, etc.), we can definitely achieve what you’re describing.

Perhaps we should just change the last section of REP 2014 to make it a bit more open/less ros2_tracing-specific to be more inclusive of things like heterogeneous systems? I’ll add a comment on the REP 2014 PR.

system · December 14, 2022, 10:09pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ROS 2 benchmark open source release ROS General ros2 , wg-acceleration , gpu , humble , benchmarking	6	10171	May 31, 2024
2022 Hardware Acceleration Report in Robotics ROS General ros2 , hardware , fpga , wg-acceleration , gpu	1	838	November 13, 2022
RobotPerf benchmarks, the benchmarking suite to evaluate robotics computing performance using ROS 2 ROS General ros2 , hardware , wg-acceleration , benchmarking , performance	1	974	January 12, 2023
REP-2008 RFC - ROS 2 Hardware Acceleration Architecture and Conventions ROS General ros2 , hardware , fpga , wg-acceleration , gpu	8	5788	October 2, 2021
ROS 2 Hardware Acceleration Working Group 2022 dissemination report and feedback request ROS General ros2 , hardware , fpga , wg-acceleration , gpu	1	781	February 15, 2023

[REP-2014] RFC - Benchmarking performance in ROS 2

Related topics