Proposal for ROS 2 Hardware Acceleration Working Group (HAWG)

Thanks for your comments and input everyone! Lots of interesting feedback. We’re glad to see the interest our initiative raised. We’ve collected all the feedback and new ideas and plan to include it in a follow up thread launching officially the WG.

We’ll give a few more weeks for everyone to share their thoughts, but stay tuned :wink: . A few comments from our side:

This is an interesting angle @kunaltyagi. I haven’t looked at SYCL but I believe this would be a great contribution/extension to the initial hardware acceleration architecture.

True @rgov, but we believe that by relying on open standards we’ll maximize our chances of suceeding here. That’s why we’re proposing to bet on C, C++ and OpenCL which can be used today for programming hardware accelerators in many devices (across compute substrates, including FPGAs, CPUs and GPUs).

Dynamic Function eXchange (DFX) (formerly known as Partial Reconfiguration (PR)) is centric to our approach at Xilinx @jopequ. That’s indeed a feature we will be leveraging while integrating such capability directly into ROS 2. So that using it doesn’t require you to be a hardware expert.

Before diving into DFX though, getting up to speed with accelerators is a must (that’s how we came up with the plan above :slight_smile: ). We don’t have yet a timeline for DFX integration but stay tuned if you’re interested or ping me if you have the resources to push in that front and would like to contribute.

Welcome to the community @Ravenwater. HLS has traditionally been a pain, I agree. Things are changing though.

Though I certainly can’t generalize for all hardware, speaking for our solutions at Xilinx and more specifically our Kria portfolio, we propose three build targets for HLS:

  • Software Emulation (sw_emu): The kernel code is compiled to run on the host processor. This allows iterative algorithm refinement through fast build-and-run loops. This target is useful for identifying syntax errors, performing source-level debugging of the kernel code running together with application, and verifying the behavior of the system. Simply put, a transformation which runs all the code in an emulated processor matching the K26 SOM, as if there wasn’t any accelerator.

  • Hardware Emulation (hw_emu) - The kernel code is compiled into a hardware model (RTL), which is run in a dedicated simulator. This build-and-run loop takes longer but provides a detailed, cycle-accurate view of kernel activity. This target is useful for testing the functionality of the logic that will go in the FPGA and getting initial performance estimates. In other words, a simulation within an emulation. The FPGA is simulated and runs inside of an emulation (QEMU), sitting together to emulated processors and allowing to get performance estimations faster.

  • Hardware (hw) - The kernel code is compiled into a hardware model (RTL) and then implemented on the FPGA, resulting in a binary that will run on the actual FPGA.

sw_emu allows you to run both the host code and the accelerator inside an emulator (QEMU) and the transformation takes about a minute. This includes the kernel’s code which runs as if it was host code. I’m using this approach everyday and I believe it is sufficient to meet the common software development flows in ROS/robotics.

Here’re some numbers I just produced for the sake of the argument:

build target ROS 2 accelerated package build time
sw_emu \approx 1 minute
hw_emu \approx 4 minutes
hw \approx 23 minutes

Note this depends heavily on your developer workstation characteristics. Mine is based on a AMD Ryzen 5 PRO 4650G.

These are approximate and based on a ROS 2 package in my current workspace (including host code and kernel) which includes a very simple kernel.

I’d be curious to hear what are your thoughts on this @Ravenwater and everyone else. I agree nevertheless that we should push towards getting a similar user experience, both time-wise and development flow-wise.


@vmayoral The sw_emu/hw_emu/hw targets you describe are solid and will be very productive. Absolutely thrilled that this is coming to the community.


Hi @vmayoral, which ros package are you building, and which kind of acceleration functions are you using for the tests?

Which is going to be the target platform? Kria KV260? How do you manage different FPGA board resources if I want to use the kernel in another board?

Do I need to compile the kernel and FPGA design every time on my own or will I be able to download it.

Hi @vmayoral ,
I am currently a (Xilinx) FPGA user. I would love to help on the project on open source kernel acceleration.

1 Like

Thanks to everyone that showed interest! WG announced and first meeting called for. See details at Announcing the Hardware Acceleration WG, meeting #1.

In case you’ve missed it, the first meeting of the WG will happen next week at: 2021-06-30T18:00:00Z.

  • Coordinates: Zoom
    • Phone one-tap: US: +17209289299,99299967182#,0#,8504098917# or +19292056099,99299967182#,0#,8504098917#
    • Meeting URL: Launch Meeting - Zoom
    • Meeting ID: 992 9996 7182
    • Passcode: Xk.X73&rNY
  • Preliminary agenda:

    1. Introductions
    2. ROS 2 Hardware Acceleration WG, quick review of objectives, rationale and overview
    3. Initial hardware acceleration architecture for ROS 2 and short demonstrations
    4. Community hardware platforms (e.g. Ultra96-v2), process and steps
    5. Q&A
    6. (your acceleration project)
1 Like

Hello @vmayoral ,

I am an Embedded Engineer with some experience in FPGAs. Really interested to join this group and contribute. How can I get started?


@kscharan, you can get started checking the resources at ROS 2 Hardware Acceleration Working Group · GitHub. Then, watch the first two HAWG group meetings and review the resources:

Stay tuned for upcoming ones and open a ticket at GitHub - ros-acceleration/community: WG governance model & list of projects if you have any ideas/projects where you’d like to contribute.

1 Like

Thanks @vmayoral for the quick reply. Excited to get started with my Ultra96v2. :relieved:

Back in early 2021 we proposed this WG and kicked off activities with the following goals:

I’m happy to report on the following progress that happened during 2021:

The ROS 2 Hardware Acceleration WG work reached more than 250.000 users/roboticists and generated more than 2000 reactions. The community repo has ~200 biweekly views and the recorded meetings more than 1000 views (data disclosed at the ROS 2 Hardware Acceleration Working Group 2021 dissemination report).

Target Description
2021 :white_check_mark: 1) Design tools and conventions to seamlessly integrate acceleration kernels and related embedded binaries into the ROS 2 computational graphs leveraging its existing build system (ament_acceleration extensions) [1], meta build tools (colcon-acceleration extension) and a new firmware layer (acceleration_firmware) [2].
2021 :white_check_mark: 2) Provide reference examples and blueprints for acceleration architectures used in ROS 2 and Gazebo.
2022 :white_check_mark: 3) Facilitate testing environments that allow to benchmark accelerators with special focus on power consumption and time spent on computations (see HAWG benchmarking approach, community#9, tracetools_acceleration, ros2_kria)
2022 :warning: 4) Survey the community interests on acceleration for ROS 2 and Gazebo (see discourse announcement, survey).
2022 :warning: 5) Produce demonstrators with robot components, real robots and fleets that include acceleration to meet their targets (see acceleration_examples).

During today’s meeting 2022-01-25T18:00:00Z, we’ll go through these and present a new set of objectives for this 2022.

  1. See ament_vitis ↩︎

  2. See acceleration_firmware_kv260 for an exemplary vendor extension of the acceleration_firmware package ↩︎

Hi, and great work addressing this important issue. How is the support for writing accelerators in HDLs? I tried building a couple of the examples and the HLS/OpenCL flow works really great, plug-and-play. One thing that I find tedious with SoC FPGA design is manually creating the AXI interfaces to the PS, DDR and the SW drivers. Is there ways to automate this with the colcon/ament extensions for accelerators written in a HDL as well?

1 Like

Awesome to hear @erlingrj, thanks and keep the feedback coming please.

I am not aware of anyone looking at this from the ROS side at the moment. From a Xilinx’s tooling perspective, this is fully supported but you need hardware skills to use it at the moment. A few pointers for hardware engineers:

The current CMake macros to integrate Vitis capabilities (ament_vitis) only allow you to either a) generate acceleration kernels (technically, .xo files, Xilinx Objects) from C++ [1] or b) link together (place & route) various kernels [2]. Ideally, we’d have CMake macros that generate a kernel out of HDL sources. The way to implement this would be to generate a Tcl file and pass that to Vivado for .xo packaging. This is demonstrated in this example. I prototyped Tcl script generation from CMake macros a while ago in here.

If you have development cycles @erlingrj and want to merge both of these last pointers together, I’d be happy to review a PR with a new CMake macro for that at ament_vitis. From that, it’d be pretty easy to get a full example using those macros and RTL within acceleration_examples.

Can you describe a particular ROS use case that drives this ask (node/component/computational graph)?

  1. see vitis_acceleration_kernel ↩︎

  2. see vitis_link_kernel ↩︎

Thanks for your reply @vmayoral. I am not that well-versed in Vitis and Xilinx Objects. I am gonna take a good look at your references. I could be interested in contributing a flow for RTL kernels. But first I would need to add support for the Zedboard (as it is my only Zynq board).

1 Like

Sure, refer to the ticket tracking the port to the Ultra96v2, there’re lots of good bits in there on how to bring the architecture we’ve put together up to speed in a new board.