REP-2008 RFC - ROS 2 Hardware Acceleration Architecture and Conventions

vmayoral · September 1, 2021, 4:38pm

Glad to read the initiative is well received.

I think there’re some good ideas in here, but some of your comments sound to me beyond the scope of what (in general) REPs contain and very technology-specific, which is the opposite I’m trying to achieve by fighting so much to keep it technology-agnostic (e.g. how SDKs are managed or artefacts built). Xilinx is providing a reference implementation that anyone’s welcome to imitate, but we should not impose this to other vendors.

Regarding the difficulties with colcon you point out, see the reference implementation of the firmware artifacts mentioned above (you’ll need to download the release to inspect the source code , GitHub doesn’t allow me to push it in the repo due to size limitations). That in combination with the ament_vitis ament extensions are the magic you mention. Note both, firmware and ament extensions are purposely built so that they can be replaced with other technologies.

See our first meeting recording for more context on the architecture and goals proposed in the REP.

Here we disagree, but I’d love hear your thoughts back:

First, if you’re a maintainer, you do not have to write kernels. You have the possibility to do so if you wish to dive into the hardware world, but that’ll rarely happen. In most cases, it’ll be hardware experts the ones writing those kernels (specially employees from silicon vendor companies). This is the most likely scenario, and since it’s already happening (I’m doing it , and so are a few others), what makes the most sense is to coordinate a common abstraction layer that allows each one of these vendors to interoperate from a higher level perspective.
This is what was done successfully with other layers in the past with. E.g. with DDS, nobody forced the vendors (i.e. DDS ones) to change. Similarly, it’s unreasonable to expect silicon vendors to change their acceleration languages (as I argued above, there’re good reasons why HIP, HLS and CUDA are there, and they will not dissapear), since you’ll obtain best performance with them on each one of their technologies. Even if you’re a company just producing IP, you’ll still benefit from having such a reference architecture and conventions. After all, you’d like to sell you IP possibly in various format to maximize revenue, which this REP favours.
Second, the proposed architecture gives you as a ROS 2 package maintainer the possibility to leverage kernels that others may have written with in a common and consistent syntax. This to me rejects your claim that this REP doesn’t help.
Third, as a user (company building a particular robot) that wants to leverage hardware acceleration, typically, she’ll look into different axes, including easy of use, integration with ROS 2 infrastructure (which is why the reference architecture extends ament and colcon, and doesn’t reinvent the wheel), performance, determinism, power consumption, etc. This REP facilitates a common path which allows to not just benchmark acceleration hardware, but also switch across solutions (even from the same technology, e.g. going from an embedded edge KV260, to a workstation-like PCIe Alveo card for more acceleration capabilities).
Even if you have already picked whatever hardware you’ll be using, and you’ll build yourself the accelerator, this REP aims to provide a consistent way to integrate hardware acceleration with ROS 2, with examples that can kickstart your development. I don’t see how that dimishes the value of this REP, quite the opposite.

LiyouZhou:

IMHO, we should go one step further, and unify the programming language for writing acceleratable kernels, through a generic DSL such as HALIDE https://halide-lang.org/. This way:

I write kernel only once.

The kernel can be compiled to run accelerated on every architecture CPU/GPU/FPGA.

Everything architecture-specific can be hidden from me.

This REP is a step in the right direction but I propose that it does not go far enough in terms of abstraction.

I assume Xilinx will not be very happy b/c halide does not emit to FPGA at the moment. But instead of still forcing robotics people to write FPGA specific code, Xilinx can provide a backend to halide. At the end of the day, roboticists does not know how to write good fpga code, xilinx does. So we should enable everyone to focus on what they are best at.

I like the HALIDE proposal. But again, I wouldn’t be overly ambitious if we want things to be actionable.

Don’t get me wrong, I’d love for the vision proposed to happen. It’s just that OpenCL has (as argued above) failed to force silicon vendors to converge on a kernel development language (though it’s widely used for host-to-kernel interaction). Arguably, it’s going to be hard for HALIDE to succeed where OpenCL failed.

The (best) way forward though, is not to be exclusive, but inclusive, as we attempted at the HAWG architecture. See ament_halide block:

  ROS 2 stack                   HAWG @ ROS 2 stack

+-------------+             +--------------------+
|             |             |  xilinx_examples   |
| user land   |  +-------------------+-----------+-------+--------------+
|             |  |       Drivers     |     Libraries     |    Cloud     |
+-------------+  +---------------+---+--------+-------------------------+
|             |  |   ament_vitis | ament_halide |          |  accel_fw    |
|             |  +---------------+----------+-+----------+-+------------+
|  tooling    |  |     ament_acceleration   | colcon_accel |  accel_fw  |
|             |  +------------------------------------------------------+
|             |  |      build system        |   meta build |  firmware  |
+-------------+  +--------------------------+--------------+------------+
|     rcl     |
+-------------+
|     rmw     |
+-------------+
|   adapter   |
+-------------+
|             |
| middleware  |
|             |
|             |
+-------------+

Anyone motivated to push forward HALIDE, can create ament_halide and provide a matching (halide-wise) firmware with this REP’s architecture and conventions.

Have you consider puting person-months on this @LiyouZhou? I’d be happy to walk you through the extensions needed to match HALIDE to the current architecture, and to test things together. Note again that your view can completely be embed into the existing architecture.

No need to wait for the work to be done! You can try things out today with Xilinx’s KV260 using the following ROS 2 packages which extend your ROS 2 workspace to include hardware acceleration:

build system
- ament_vitis
meta build tools
- colcon-acceleration
firmware
- acceleration_firmware
- acceleration_firmware_kv260

Topic		Replies	Views
REP 2008 final review and draft vote - ROS 2 Hardware acceleration architecture and conventions ROS General ros2 , hardware , fpga , wg-acceleration , gpu	1	652	December 28, 2022
ROS 2 Hardware Acceleration Working Group 2022 dissemination report and feedback request ROS General ros2 , hardware , fpga , wg-acceleration , gpu	1	780	February 15, 2023
2022 Hardware Acceleration Report in Robotics ROS General ros2 , hardware , fpga , wg-acceleration , gpu	1	836	November 13, 2022
Feedback on REP-2008 - ROS2 acceleration kernels with build integration ROS General ros2 , fpga , wg-acceleration	5	904	November 2, 2022
Hardware Acceleration WG, meeting #7 ROS General ros2 , hardware , fpga , wg-acceleration , gpu	5	1679	May 4, 2022

REP-2008 RFC - ROS 2 Hardware Acceleration Architecture and Conventions

Related topics