Robotic Processing Unit and ROBOTCORE

vmayoral · September 29, 2022, 9:32am

Following from the Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU) thread, I’m happy to share that we’ve published publicly and open sourced the design files of the Robotic Processing Unit subproject of the ROS 2 Hardware Acceleration Working Group.

The Robotic Processing Unit (RPU^[1]) is a robot-specific processing unit that uses hardware acceleration and maps robotics computations efficiently to its CPUs, FPGAs and GPUs to obtain best performance. In particular, it specializes in improving the Robot Operating System (ROS 2) related robot computational graphs on underlying compute resources.

The goal of this subproject is to provide the WG and other robotic architects out there with a reference hardware blueprint for building hardware accelerated ROS 2 graphs that use the best accelerator for each task. To do so, the project leverages existing off-the-shelf hardware acceleration development platforms. In particular popular ones from AMD and NVIDIA packing together 16x CPUs, a GPU and an FPGA. The resulting assembly is used to prototype a robot-specific processing unit that aims to perform best when it comes to ROS 2 and robot computational graphs.

Parts to assemble the different development boards can 3D printed. The design files are disclosed in the repository and the assembly instructions are pretty straightforward:

Step	Image	Description
`0`	871×1068 95.8 KB	Here’s the exploded view of the Robotic Processing Unit. This should help guide the process of building your own. Start by soldering the power wiring to the `KR260` and `AGX Orin` boards separately. You need to do so on the power jack pins of each one of the robotics development kits^[1:1].
`1`	1073×1062 144 KB	Screw `KR260` board to `KR260 adapter` with 4x `M3 bolts` (ISO7046) and 4x `M3 nuts` (DIN934).
`2`	1136×1059 121 KB	Screw `AGX Orin` board to `AGX Orin adapter` with the same original bolts included in the development kit.
`3`	960×1006 146 KB	Connect the previously soldered wires on both dev. boards to the `PYBE30-Q24-S12-T` DC-DC converter. Power input should come from the `AGX Orin` and regulated output to the `KR260`.
`4`	801×1036 112 KB	Screw the `PYBE30-Q24-S12-T` DC-DC converter, the `AGX orin` and the `KR260` to the `Base`. Use 10x `M3 bolts` (ISO7046) and 10x `M3 nuts` (DIN934).
`5`	671×1062 98.2 KB	Finally, fix the 4 holes of the cover and join the base and cover with 4x `M5 bolts` (ISO7046).

The Robotic Processing Unit will be used to benchmark and develop further ROS 2 API-compatible hardware acceleration tools and robot Intellectual Property (IP) cores within the community Working Group. Finally, we at Acceleration Robotics will be offering commercial support and a ruggedized version (ROBOTCORE®) of the Robotic Processing Unit to further incentivize its use.

We’re very aware that the RPU acronym is overloaded and also used to refer to other types of processing units including Remote Processing Unit, Ray Processing Unit, Real-time Processing Unit, Radio Processing Unit, Regional Processing Unit or RAID Processing Unit among others. See comparison for fun. If you have a better suggestion for the acronym, let us know. ↩︎ ↩︎

allsey87 · September 29, 2022, 2:09pm

It is probably worth mentioning that unless your robotics application requires machine learning or computer vision, this hardware is probably overkill. ROS 2 nodes and the underlying communication are pretty lightweight and will work just fine on almost any embedded SBC that can run Ubuntu (for example, the latest Raspberry Pi).

vmayoral · September 29, 2022, 2:47pm

Raspberry Pi SBCs are great and I’ve built many robots with them. They are a great starting point for prototyping simple robots and for people beginning into robotics, but they quickly fall short for many robotic applications. I encourage you to drop by the ROS 2 Hardware Acceleration Working Group and try understanding a bit about the work we’re doing there and the value behind accelerators. That includes the possibility to create custom compute architectures for building computational graphs. There’s plenty of projects and literature to read and benchmarks we’ve been producing which may help your understanding.

Also, you seem to be missing strongly that robots are deterministic machines. Meeting time deadlines in their computations (real-time) is the most important feature however other characteristics are also of relevance while designing robotic computations including the time between the start and the completion of a task (latency), the total amount of work done in a given time (bandwidth or throughput) or that a task happens in exactly the same timeframe, each time (determinism). CPUs are widely used in robotics due to their availability however they hardly provide real-time and safety guarantees while delivering high throughput. The de facto strategy in industry ^[1] to meet timing deadlines is a laborious, empirical, and case-by-case tuning of the system. This “*CPU whack-a-mole” approach in robotics is unsustainable and hard to scale due to the lack of a hardware-supported timing-safe event driven programming interface in CPUs. Hardware acceleration (with either FPGAs, GPUs or other accelerators) presents an answer to this problem. One that allows the robotics architect to create custom computing architectures for robots that comply with real-time and bandwidth requirements, while lowering power consumption.

Sure, you probably don’t want to use this type of hardware to “Learn Robotics” , but there’re plenty of use cases in autonomous mobility (AMRs, self-driving X, construction, mining, etc), industrial manipulation and healthcare robots (e.g. surgical robots) wherein you’d be surprised of how much this fits the needs.

Not true in my experience. Accelerators can be useful across robotic applications involving the whole robotics stack, going from sensing, to perception, mapping, localization, motion control, low-level control and all the way into actuation. They can help you plan faster grasps or avoid obstacles faster. They can even help you speed up your ROS 2 coordinate system transformations (benchmarks for tf2).

The reason why you probably think this way is because there are just not enough acceleration kernels publicly available for consumption for (other than ML and CV) robotic tasks, but there will be.

Liu, S., Zhu, Y., Yu, B., Gaudiot, J. L., & Gao, G. R. (2021). The Promise of Dataflow Architectures in the Design of Processing Systems for Autonomous Machines. arXiv preprint arXiv:2109.07047. ↩︎

atyshka · September 30, 2022, 3:22pm

CPUs are widely used in robotics due to their availability however they hardly provide real-time and safety guarantees while delivering high throughput . The de facto strategy in industry to meet timing deadlines is a laborious, empirical, and case-by-case tuning of the system. This “*CPU whack-a-mole” approach in robotics is unsustainable and hard to scale due to the lack of a hardware-supported timing-safe event driven programming interface in CPUs. Hardware acceleration (with either FPGAs, GPUs or other accelerators) presents an answer to this problem. One that allows the robotics architect to create custom computing architectures for robots that comply with real-time and bandwidth requirements, while lowering power consumption.

I’m not super in-the-loop regarding the hardware working group, but might I ask how you plan to achieve real-time guarantees in this architecture? FPGA’s are great for real-time determinism, but GPU’s are typically worse for determinisim than CPUs, as they lack integration into an RTOS and you’re left to NVIDIA’s default scheduler.

Also, are you running any RTOS on the CPU? Last I checked, the Jetson platform didn’t support any RTOS, only the NVIDIA automotive version of Orin supports a QNX-based OS.

vmayoral · October 1, 2022, 8:53am

I agree and share your concerns on GPUs. Similar to CPUs, they have memory-centric Von Neumann based architectures which makes determinism and real-time much more challenging than in other compute substrates. FPGAs and MCUs generally respond much better to this, and are comparably a much easier solution to use when aiming for (hard) real-time. The Robotic Processing Unit is mindful of this and packs together lots of compute including 16x CPUs, a GPU and an FPGA, mixing the technologies from AMD and NVIDIA. The following image illustrates this at a high level:

Besides these big building blocks for computations, each one of these two groups packs additional compute units that are meant for real-time specifically. In the case of AMD’s Kria KR260, besides the FPGA (and the possibilities there with soft-cores), you have 2-Core 32-bit Arm® Cortex-R5F real-time processor (CPU Max Freq 600MHz) wherein you can run stuff baremetal or with RTOSs (e.g. FreeRTOS or NuttX are good choices). Similarly, the Jetson AGX Orin features a Jetson Sensor Processing Engine (SPE), which is a Arm® Cortex-R5 MCU that can also run RTOSs like FreeRTOS. Overall, you’ve got plenty of choices which are hard real-time capable in the Robotic Processing Unit provided you architect things appropriately. The challenge with these SoCs (and groups of them interconnected) is precisely there, in the architecture, and that’s what we’re tackling in the ROS 2 Hardware Acceleration Working Group and why REP-2008 is so relevant (simplifying support for various accelerators in a single, consistent, ROS-centric flow).

In case you wonder about the interconnection between groups in the Robotic Processing Unit, it’s interconnected in a common Ethernet databus which allows combining the traditional control-driven approach used in robotics with a data-driven. It also aligns great with DDS and the base assumptions of most DDS vendors. We’ve done lots of testing mainly with PCIe and Ethernet, determining Ethernet as the best choice to meet time deadlines. PCIe is promising and great for throughput but it behaves really weird while trying to real-time capabilities.

AMD’s Kria KR260 provides support for the open source Xen hypervisor^[1] and a portfolio of RTOSs to choose for the various CPUs. I have tested this extensively and though complex to integrate, can recommend it. It provides really good results. In the case of the Jetson AGX Orin, you have RedHawk Linux open source RTOS support, but you’re right in the sense that more general RTOSs like QNX seem to be available only for the driving (premium) solutions.

This shouldn’t surprise you much though. NVIDIA’s approach to open source has always been quite controversial and called to undermine open source by some. Quoting Linus Torvalds (2012)^[2]:

I’m also happy to very publicly point out that Nvidia has been one of the worst trouble spots we’ve had with hardware manufacturers, and that is really sad because then Nvidia tries to sell chips - a lot of chips - into the Android Market. Nvidia has been the single worst company we’ve ever dealt with.

Even today in their ROS packages, you’ll find plenty of static libraries in binary format, which you can’t adapt to your needs. NVIDIA continues (in my opinion) playing funny with licensing, forking ROS Apache 2.0 (or similar, commercially friendly) code and re-licensing to their needs. This is my major criticism to them these days. I complained about this in the past which triggered a license change, however the trend continues.

As is, NVIDIA is hard to work with still. It took them years to acknowledge their GEMs approach wasn’t worth much without ROS. Fortunately they did and their discourse is now all about ROS !

See Real-time ROS 2 — KRS 1.0 documentation for more details on how to achieve proper real-time with AMD’s Kria solutions. ↩︎
https://www.youtube.com/watch?v=iYWzMvlj2RQ ↩︎

DasRoteSkelett · October 6, 2022, 2:04pm

Hi!
I like the Idea, but I really wonder, and that extends also to the list of ROS Robotics Companies, is, how are you to deploy ROS2 in a product, either RPU or any other board for that purpose (especially when not x86 based). Installing ubuntu on some development board does not cut it – to me at least. There is no way around a proper embedded linux, hence buildroot or openembedded, which enables you to:

Use an optimized, possibly PREEMPT_RT kernel
Create a SBOM and manage licensing somewhat decent
Create Updates that are atomic
Are not relying on other companies supporting your hardware with prebuild binaries / packages.
Remove stuff that is not needed
Optimize on the platform used
Downsize of cost optimization
…

LG no longer updates meta-ros, and I really wonder why it seems nobody really cares…

Regards,
Matthias

vmayoral · October 6, 2022, 4:12pm

I agree @DasRoteSkelett, it’s hard to avoid OpenEmbedded/Yocto (or buildroot, though I’m personally more in favour of OE/Yocto) for production.

Though there’re some building products on top of Ubuntu, in my experience, when creating a product, Yocto is typically the path forward which allows you for those customizations you mention above, and many more including additional security or leveraging a hypervisor. Within this last bit, you can actually create various VMs, each with its own rootfs. You can still leverage the Ubuntu rootfs in one such VM, but building such partitions requires you to step aside from the Ubuntu experience.

We wrote about this not long ago at ROS 2 Humble with Yocto and PetaLinux.

I share the feeling , but our group cares (the same group behind ROBOTCORE® and the RPU) and is the one behind the effort bringing Yocto support for Humble (see Humble release thread, see Humble support in Yocto (Honister)). By the way, ROBOTCORE® (product, see tech specs) offers Yocto support and Acceleration Robotics offers consulting services around Yocto as well.

Community contributions-wise, I actually updated the recipes a few weeks ago over the summer (see open PR).

I also volunteered in the past to take over maintenance of meta-ros but never heard back from anyone. Let me reach out again to @tfoote and @Katherine_Scott to see if we address this.

dvoid · October 6, 2022, 5:16pm

You can add me to the OE/Yocto camp. Software has to be upgraded, but the common approach is to release to one specific OS (usually not even an LTS version), and get locked in to that one without any security updates. At least with Yocto you can lock in a stable version and apply updates and security patches on top, whereas rolling your own backports or the cascading dependency hell caused by updates to other linux OS that doesn’t get updated in the specific software package is a losing battle. A container can help with this, but now you have to effectively load two (or more) Operating Systems onto the same (and constrained) hardware. The issue is with the early adopters like hobbyists or academics, they want something easy to setup or has a low barrier to entry. Yocto isn’t that (it is getting better with every release), and without say publicly available and maintained images (with say ROS2) it probably won’t appeal to the former user base as they can’t get up and running easily.

peterpolidoro · October 6, 2022, 7:46pm

Have you considered using GNU Guix as an alternative to the Yocto Project?

kisg · October 6, 2022, 9:04pm

Hi,

I would like to see Android become one of the supported embedded / robotics base platforms.

Many people disregard Android as bloated due to the usage of Java / Kotlin to implement the higher-level APIs. But it is actually possible to build Android Core without even enabling the Java runtime (ART).

At the same time, Android has a lot of features that are important for embedded / robotics use cases. Just a few examples:

Huge development ecosystem: a large Google development team, every SoC vendor, and every phone and tablet device maker has Android development teams with system-level programming expertise. You can’t (and shouldn’t ) get a decent SoC that has no Android support from the vendor.
Security: Android has a very detailed set of SELinux policies, secure boot, and secure transactional updates, and its whole security architecture is constantly being refined due to being a high-value target for attackers. There are also extra hardened versions of Android, like GrapheneOS.
Stable kernel API: This is an important project from Google that makes the lives of the hardware vendors easier while allowing them to upgrade to newer base kernels that bring in new features and more safety.
Hardware Abstraction Layer (HAL): Android provides a very nice, IPC-based HAL that can split the responsibilities between the SoC and other component vendors and the OEMs building Android systems. The same would be very useful in robotics as well: ROS sensor- and actuator nodes could be implemented against a well-defined HAL interface, that could be device-independent. Then the different hardware vendors would only have to implement that interface in their drivers. This is already done for cameras and other devices used in phones.
Binder IPC: One of the great additions of Android was the introduction of the Binder IPC, which provides way more features compared to the usual POSIX IPC mechanisms. It should be possible to implement a fast RMW based on Binder (either directly, or as a local fast path for another network-enabled system like Zenoh).
Android Automotive OS: Android is no longer only targeting handheld consumer devices. With the Android Automotive OS Google is already investing a lot into features that are required for robotics applications as well, like planning for long-term security.
Development Tools: I still remember when the Android source code was first released in 2008. With a single make command we could build the whole userland of a completely functional smartphone. I have been involved with a lot of different proprietary and open-source embedded platforms at the time (including Maemo, OpenEmbedded, or OpenMoko). Android was way easier to work with. Later, as Android grew, things became more complex and I think that the Soong build system was not the right move, but now with the Bazel transition things are going in the right direction again. I think that Bazel would be a great alternative build system for ROS2 as well, and there is already a 3rd party project providing the necessary Bazel build rules for ROS2.
Safety with Rust: Rust is slowly (or rather pretty quickly…) becoming the de-facto system-level programming language for safety-critical environments. The Android team is leading many of these efforts by doing the work to add Rust support to the kernel and to allow the development of services in Rust in Android itself.
Profiling and Introspection: Performance is obviously an important concern in Android, and while we have access to eBPF and other low-level performance profiling facilities of Linux, Google also included Perfetto, a very nice system-level profiling framework. Due to its design, it is even feasible to keep it included in production builds, and simply disable it at runtime. It is capable of collecting traces of many system components all at once and analyzing them using the easy-to-use web-based UI.

Using Android in Headless Systems was initially proposed more than 10 years ago, and even Google tried to provide a solution for that with the short-lived Android Things side-project. I think Android Things ultimately failed because it was not open-sourced, so the community could not build upon the base provided by Google and extend it to new use cases.

I have been working with Android on various non-phone device projects (like set-top boxes) since the very beginning (got it running on a Sharp Zaurus even before it was open sourced. )

I think that now Android is in a much better shape to add a “robotics / IoT flavor” than before. It even already has a fully virtualized platform port called Cuttlefish, which is also used by the Automotive version.

I encourage everyone to take a look around the source.android.com website and check out the technical documentation on the different aspects of Android.

Kind regards,
Gergely

PS: Shameless plug: If you are considering Android as an option for your Robotics platform, I am available for consulting and would be happy to talk with you.

vmayoral · October 7, 2022, 10:14am

Just learning about this project. Interesting. Watched the talk. Dives certainly into some of the pain points. Will have a closer look and continue reading. My major concern though is availability/support for hardware.

Do you have any good experiences with it @peterpolidoro with production-grade hardware that you can share? If so, with which boards?

peterpolidoro · October 7, 2022, 3:24pm

Yes, that is a great point and a valid concern. I think more work needs to be done before it is ready for production on embedded boards, but the purely functional deployment model and emphasis on reproducibility and solving the problem of dependency hell makes it a very promising option for both the embedded and non-embedded ROS ecosystem.

GNU Guix can cross-compile packages for a variety of supported targets:

aarch64-linux-gnu
arm-linux-gnueabihf
i586-pc-gnu
i686-linux-gnu
i686-w64-mingw32
mips64el-linux-gnu
powerpc-linux-gnu
powerpc64le-linux-gnu
riscv64-linux-gnu
x86_64-linux-gnu
x86_64-w64-mingw32

It is true, though, that most of the focus does still get placed on x86_64 for desktops, laptops, servers, and HPC clusters, but there is no reason why it should not work just as well on every architecture.

GNU Guix can be used as a functional package manager on top of other GNU/Linux distributions, such as Ubuntu, either in addition to or instead of, other packaging options such as apt, pypi, go, crates, traditional ROS packages, etc.

Guix has a feature I love named guix shell, that is sort of like a combination of a generalized Python virtual environment (for any software language not just Python) and Docker. You can use it to quickly create a shell environment or a container that only has a package and its dependencies and nothing else or the packages and dependencies necessary to develop a package. It can be used like a ROS workspace or ROS Docker container, but only including the bare minimum packages and not, say, all of Ubuntu.

Guix can be used to create its own GNU/Linux distribution called GNU Guix System using a declarative operating system configuration rather than the imperative approach taken by say Ubuntu or a Dockerfile. It could be used instead of Ubuntu on desktops, servers, or clusters, or embedded boards instead of something like Yocto.

As far as I know, no ROS code has been packaged yet for Guix. It does have some disadvantages that may prevent it from ever getting adopted into the ROS ecosystem. For example, Guix packages will never work natively on Windows. Guix is developed by proponents of free open source software, although there are optional Guix channels that contain packages of non-free software.

peterpolidoro · October 7, 2022, 4:49pm

You do make me want to learn more about Android, thank you. So are you saying you do not need Java/Kotlin for the embedded/robotics use cases? Or are you saying that while it is possible to build Android Core without it you would still use it for these cases?

kisg · October 7, 2022, 8:32pm

So are you saying you do not need Java/Kotlin for the embedded/robotics use cases? Or are you saying that while it is possible to build Android Core without it you would still use it for these cases?

It depends on the use case. If you only want to use the low-level services of Android, like the HAL, the minimalistic Libc, or Binder, then you can get away without using the ART runtime at all.

If you want higher-level features, like using APK packages to package your ROS nodes (which could also be a good idea depending on the use case), then you will need the ART runtime and also some of the system services written in Java / Kotlin.

That said, I don’t think running Java code on robots is out of the question - I am pretty sure that Java code will in general execute faster than Python.

samuk · October 8, 2022, 11:38am

I guess you’re aware of the open hardware KR260 baseboard, Is that what you’ll be using?

https://antmicro.com/blog/2022/09/kria-ultrascale-plus-som-baseboard/

vmayoral · October 8, 2022, 2:04pm

No we do not, but that’s a nice carrier for the K26 SOM indeed! (note KR260 is an AMD’s product including the K26 SOM and their own carrier). We are using plain off-the-shelf KR260. That simplifies building your own RPU. Also, just FYI, the KR260 carrier board is also open hardware, design files are open at Kria KR260 Robotics Starter Kit and the BSP is robust enough to build on top of it.

I’d encourage folks interested to look at AMD’s design. It’s rather complete and has the PL routed to 2x Ethernet PHYs which allows you to do all sort of wicked things directly from the FPGA (including TSN).

samuk · October 8, 2022, 2:26pm

The RiscV SOM (in RPI CM3 dimensions) Hosted in this baseboard with a Coral M2 Might work for more hobbyist-grade projects. It looks like ROS2 can run on RiscV I wonder if anyone will squeeze a P650 into the CM3 footprint.

It wouldn’t get you NPU of course although P920 might deliver that in time.

vmayoral · October 9, 2022, 1:09pm

Of course it can. We worked with Microchip to enable ROS 2 Humble in an SoC that packaged together RISC-V cores and an FPGA. See ROS 2 Humble in Microchip PolarFire® SoC FPGA Icicle Kit with Yocto. The PolarFire SoC is also a pretty interesting compute baseline but we ended up picking the KR260. It’s a more robust board for robotics, with tons of I/O and possibilities.

samuk · October 9, 2022, 4:23pm

Interesting article/ work! I’ll have a proper read.

I’m new to ROS in general, but starting to see the limitations of Ubuntu. I’m also a big open hardware fan so this thread has been a useful prompt for me to start looking at RiscV stuff.

I’ll try not to hijack your thread further. I am interested to look at what the most capable open hardware ‘RPU’ would be. It looks like P920 might be a bit of a game changer. Do you have a sense of when it will be available?

razr · October 13, 2022, 6:28pm

I have set up a BoF session at the ROSCon2022 to discuss #meta-ros, see Meta-ros BoF at ROSCon2022
Everyone is very welcome to join.

Topic		Replies	Views
ROS 2 Hardware Acceleration Working Group 2022 dissemination report and feedback request General ros2 , hardware , fpga , wg-acceleration , gpu	1	778	February 15, 2023
Hardware accelerated ROS 2 pipelines and towards the Robotic Processing Unit (RPU) General ros2 , hardware , fpga , wg-acceleration , gpu	1	1855	April 28, 2022
Proposal for ROS 2 Hardware Acceleration Working Group (HAWG) General ros2 , hardware , fpga , wg-acceleration , gpu	33	10763	February 7, 2022
2022 Hardware Acceleration Report in Robotics General ros2 , hardware , fpga , wg-acceleration , gpu	1	835	November 13, 2022
Robotics MCU: A robotics microcontroller unit (MCU) powered by RISC-V and ROS 2 General ros2 , hardware , fpga , wg-acceleration	7	2298	October 30, 2022

Robotic Processing Unit and ROBOTCORE

Related topics