Feedback on REP-2008 - ROS2 acceleration kernels with build integration

This post provides feedback on REP-2008. It is great to see the significantly re-written REP-2008 which makes it easier to understand and review. The decision to separate benchmarking into REP-2014 is also a welcome change. Thanks to the TSC for driving these changes in the updated proposal.

Comment on REP-2008 for Consideration

What this REP provides is an entry point for developers to define their own acceleration kernels, integrate them into ROS 2 applications and if appropriate, command the ROS 2 build system to build such accelerators.

The current name of REP-2008 does not match its stated purpose. The stated purpose is ā€˜to define acceleration kernels and integrate them into ROS with existing hardware acceleration standardsā€™. A more representative name is ā€œREP-2008 - ROS2 acceleration kernels with build integrationā€. Can this REP be renamed to accurately reflect its scope and purpose?

For example, the use of OpenCL, CUDA, Vulkan, HLS, Halide, Exo, etc. is beyond the scope of this REP and often attached to the specific accelerator in-use

REP-2008 declares that industry standard interfaces for hardware acceleration are out of scope. This includes open standards, industry standards, and proprietary interfaces which appears to be in conflict with the below statement from the proposal:

A ROS 2 package supports hardware acceleration if it provides support for at least one of the supported hardware acceleration commercial solutions (or accelerators) that comply with this REP.

OpenCV, Vulkan, et. all have implementations providing hardware acceleration. The clause above is saying that using OpenCV or Vulkan would not comply with this REP, and not support hardware acceleration. This language should address building accelerated kernels and linking to those kernels. Broader statements on hardware acceleration are beyond the scope of the REP.

The architecture proposed in this REP is meant to be generic and technology-agnostic

Proof of this REP as vendor-neutral and not biased can be improved with an Altera FPGA implementation. Are there implementations to verify this as an ament_non-AMD producing accelerated kernels?

develop a hardware acceleration kernels for those functions identified before, and optimize the dataflow across Nodes

3.1 accelerate computations at the Node or Component level for each one of those functions identified in 2. as good candidates.

3.2 accelerate inter-Node exchanges and reduce the overhead of the ROS 2 message-passing system across all its abstraction layers.

Data flow optimization between nodes can be provided by REP-2007 and is implemented in RCL with ROS2 Humble. REP-2008 defining this is duplicate work, and adds developer confusion and fragmentation. Can REP-2008 remove this?

In summary we propose:

  1. Rename to ā€œREP-2008 - ROS2 acceleration kernels with build integrationā€ to accurately reflect its scope and purpose.
  2. The ā€œSpecificationā€ section language needs clarification around building accelerated kernels and linking to those kernels, not a broader statement on hardware acceleration which is out of scope in the REP.
  3. Add one or more example implementations to verify this as an ament_non-AMD producing accelerated kernels.
  4. Remove 3.2 from Methodology as this can be provided by REP-2007.

Thanks.

1 Like

Glad to see NVIDIA finally voicing out their opinion about these matters! As a minor clarification, these changes are not driven by the TSC. They are driven by Acceleration Robotics in cooperation with AMD, and also by the ROS 2 Hardware Acceleration community Working Group. All in an attempt to create a vendor-neutral environment for the integration acceleration kernels with ROS 2 packages while maintaining ROS 2 API-compatibility to deliver hardware acceleration capabilies.

The TSC as well as many others over the past year have provided relevant feedback. Weā€™ve taken all in consideration and reviewed the REP accordingly. Very happy to read constructive feedback from NVIDIA now.

Iā€™d believe it does match its stated purpose. It clearly provides and proposes a reference architecture and a series of conventions while using hardware acceleration with ROS 2. It also proposes a methodology that should help maintainers and developers integrate hardware acceleration capabilities in a scalable manner while avoiding vendor lock-in.

Iā€™m not against improving the naming though. You propose to focus on the build integration capabilities. I like this, as itā€™s the major resulting output of it, however this is but one of the additions this REP considers. Thereā€™re more including the firmware extensions and various other conventions proposed which have repeteadly been discussed at the ROS 2 Hardware Acceleration WG. Iā€™ll touch on them today again in my talk at ROSCon.

A better name spanning from your suggestion could be ā€œREP-2008 - ROS 2 hardware acceleration reference architecture, conventions and build integrationā€ we could also go for ā€œREP-2008 - ROS 2 hardware acceleration reference architecture, conventions and build integrationā€ if preferred.

The texts indicates that those languages are beyond the scope of this REP, meaning that it does not aim to set policy on what should be used. It doesnā€™t say that kernels with those languages would not comply with this REP.

It does say, that for a package to claim support for hardware acceleration, it should provide support for at least one accelerator that complies with the REP specification. This aims to put pressure on vendors and to put resources on aligning with this REP, so that developers can easily switch between accelerators if desired. Thatā€™s all. Again, aiming to facilitate a vendor-neutral environment and to avoid vendor lock-ins. I understand though that this is inconvenient for players that currently have dominant positions.

An we at the HAWG are happy to report weā€™ve been discussing things with Intel. Moreover, Iā€™m happy to also share again that following from feedback from Open Robotics, we did show non-AMD capabilities (particularly, we demonstrated minimal capabilities targeting NVIDIAā€™s silicon). A similar effort was performed with Microchipā€™s PolarFire SoC (RISC-V based). As reported previously, itā€™s up to silicon vendors to decide to invest in aligning. Unless the community sets policy, itā€™ll be a chicken and egg problem. Iā€™ve said this repeatedly. Unless we set some sort of policy, then the community will likely suffer from strong incompatibilities across acceleration solutions.

So summarizing, itā€™s been shown. Improvements and further extensions to other vendors are on the way, but require more resources which weā€™re securing.

I disagree with this. I think we want some general guidelines on how to implement hardware acceleration. REP-2007 treats a specific ROS 2 topic but doesnā€™t focus itself in hardware acceleration. This REP does.

2 Likes

Yes, it proposes a reference architecture for the building and linking. It proposes conventions for using acceleration kernels from the build output in ROS2.

The proposal itself is not a guideline for hardware acceleration, however, as this would be guidance on what to implement in a hardware accelerator versus a CPU. Writing such a guide is quite difficult, as best practices are specific in nature to the type of accelerator and its implementation, be it fixed function, FPGA, or high performance compute. This is why standards like OpenCV, Vulkan, et. exist to abstract this problem away from developers, which is not what this REP provides.

REP-2008 is not needed where standards exist. The maintained package Image pipeline in ROS2 Humble uses OpenCV which can be accelerated by hardware, and REP-2008 was unnecessary to do this. What REP-2008 focuses on is to enable FPGA tools to compile ROS nodes, packages or ROS itself into firmware, and link that into the system.

REP-2008 contributions are to the build system, using the specified vendor specific build tools, firmware linking, and tracing / benchmarking (now part of REP-2014).

Hence the recommendation to make the name aligned to the problem the REP tries to solve, which is REP-2008 - Accelerated kernels with build integration.

Itā€™s great HAWG is discussing this with Intel. It would be good to see Intel implement this REP for their Altera FPGAs so itā€™s not Xilinx specific.

Successful standards for accelerating functions, like OpenCV, Vulkan, et. have broad developer and vendor support. Itā€™s not the quantity of people on a call, but active-participation and contribution from developers and vendors with implementations for success.

It would be good to see evidence of this active participation and adoption.

Seems youā€™ve missed the problem REP-2007 solves.

This step is a duplication of REP-2007.

An accelerated ROS2 node needs to work with other non-accelerated nodes, on topics using standard types defined in ROS2. This structure provides the interoperability for developers to leverage nodes to create graphs performing functions for robotics.

REP-2008 claims to optimize dataflow across nodes, with no structure on how to do this. The vagueness in REP-2008 will create fragmentation and difficulty for developers in adoption.

REP-2007 provides an approved, and implemented function to optimize topic communication between nodes abstracting the platform and hardware specifics. Nodes by default are compatible using standard topics with standard types. Adapting the type enables optimization. Type adaptation does not specify how to optimize the adapted type as that is platform and hardware specific, while providing the structure for those optimizations to be compatible on topics.

For example, on a FPGA, one can use type adaptation (REP-2007) to communicate topics between accelerated firmwares on AXI by having the nodes on the CPU perform the negotiation. This structure allows for hardware optimized nodes that support AXI communication to subscribe to a topic, and maintain compatibility with nodes that do not support AXI communication.

REP-2007 is proven in use, as we have delivered great hardware acceleration performance in ROS2 Humble, with low latency in 16 nodes for developers.

Acceleration of inter-node communication is provided in REP-2007. REP-2008 can refer to REP-2007.

Thanks.

Wrong. It ensures thereā€™s a vendor-agnostic process to build and use acceleration kernels. Alternatively, like NVIDIA is doing, the path forward is a vendor lock-in approach. One that forks ROS 2 packages, changes ROS 2 APIs and further fragments the ROS ecosystem. This doesnā€™t benefit anyone. Not even NVIDIA in the long term.

Refer to the motivation section of REP-2007 to see the problems it solves. If you donā€™t see this goes beyond hardware acceleration then we certainly have a very different understanding.

I re-state my disagreement, I think we want some general guidelines on hardware acceleration in REP-2008 to introduce hardware acceleration in ROS 2 in a vendor-neutral, scalable and technology-agnostic manner. I am happy to compromise though and add a pointer to REP-2007 and REP-2009 in 3.2, as these can be used for that purpose.

4 Likes

Hi together, I am a follower of REP-2008 since the beginning as I have seen a big lack in current flows when building custom accelerators which are easy to use from a software perspective.

Every vendor is doing itā€™s own extension to tackle the customers need for building accelerators. All those approaches lacking in one aspect: Usability.

Why should a system architect build something with a specific extension to then wrap it again with own parts into ROS for using the accelerated kernel. Architects (mostly Software-Architects) are not aware of all acceleration possibilities and will not use the best architecture for their application if they are thrown with too much interfacing etc.

When looking to the past, where also hardware acceleration with FPGAs, GPUs and other architectures was possible, all accelerators had to be attached by hand to the Function-Level, e.g. StereoMatcher like with OpenCV and then to the ROS-Layers. Next step was to give architects library-functionalities at hand which once more needed to be wrapped to get access to the ROS-Layers and the ease for complete system building. With REP-2008 the first time in my over 10 year ROS-Accelerator based career, all parts come together in an Architect-friendly manner including the tracing and integration into native build system. I can understand that adaptation to this native way of working might be hard for some companies as it breaks with vendor specific adaptations which bind people to the usage of JUST this accelerator.

I strongly suggest to keep this native way of working to get the needed acceleration to nowadays robotics applications and help robotics to become the next dominance in world changing technology and to let the architects choose the best accelerator with an easy to use interface.

3 Likes

As follow-up, REP-2008 - ROS 2 Hardware Acceleration Architecture and Conventions by vmayoral Ā· Pull Request #324 Ā· ros-infrastructure/rep Ā· GitHub implements the above.