Adding Bazel as an official alternative build system for ROS2 Iron?

Hi Everyone,

I think that there are a lot of use cases, where using Bazel to provide completely controlled, traceable, and reproducible builds would have a lot of benefits, especially when trying to build safety-critical systems.

Luckily, I am not the only one, who thinks so. There are at least 2 publicly available projects to provide Bazel rules to build ROS2 and ROS2-based apps.

  1. Apex.AI just announced theirs in a great presentation at ROSCon:
    Bazel and ROS 2 – building large scale safety applications on Vimeo
    Their in-progress code is here: GitHub - ApexAI/rules_ros: ROS 2 rules for Bazel

  2. There is also an alternative ROS2 Bazel implementation: GitHub - mvukov/rules_ros2: Build ROS2 with Bazel

I opened this topic to see if there would be interest in having an official Bazel build implementation in the next release of ROS2.

I also would like to understand what are the steps to get to this point (apart from doing the actual implementation. I assume a REP needs to be created and accepted by the TSC.

Some clarifications / summary based on the discussion below:

Non-Goals:

  • Adding Bazel build files to every ROS package repository
  • Requiring package maintainers (or anyone else) to implement Bazel support, even if they are not interested / have no resources to do so.

Goals:

  • Create a separate repository under the ROS umbrella to host Bazel build rules for ROS packages and their external dependencies.
  • Create a community-run working group to organize activities, like
  • Unify the different efforts of providing Bazel build rules for ROS2 and reduce the fragmentation of resources.
  • Provide a way for projects/teams with low DevOps budgets to implement a Bazel-based build system with limited investment.
  • Release the Bazel build rules together with a ROS2 release as a development component: the build rules are tagged together with the release and made available for download, but no binary releases are provided that are built using these build rules.

Looking forward to your comments and support!

Kind regards,
Gergely

11 Likes

An REP would be a good way to start. But before you start on that, you need to better define the goal. That is, I can imagine at least 3 different goals here:

  1. Compile Bazel packages from source along with the rest of the CMake-based ROS 2 packages. That would be something along the lines of GitHub - colcon/colcon-bazel: An extension for colcon-core to support Bazel projects. , and maybe the Apex talk (though I haven’t watched it yet).
  2. Compile all of ROS 2 from source with Bazel, not using colcon or CMake at all.
  3. Be able to build and release packages built with Bazel on the buildfarm.

Once you have an idea of exactly what you want to accomplish, then a REP makes sense.

3 Likes

Thank you @clalancette!

To clarify, my main target is option 2: to build ROS-based systems without Colcon and Cmake, with Bazel as its main build system. Both Apex.AI’s and @mvukov’s solutions also target this scenario.

Kind regards,
Gergely

1 Like

Hi All:
As you pointed out in your initial post, @kisg, the goal is a controlled, traceable, and reproducible build as it is necessary for safety critical systems. Using Bazel for this purpose might be the most obvious approach, but is strictly speaking an implementation detail.

Spoiler alert: from here on my post is biased!

After deciding to go with Bazel (in absence of other practical solutions) we can pinpoint the goals more precisely. For a PEP we can then select from these:

  • completely build from source
  • precisely pin all external dependencies
  • have complete knowledge about what was used to build an application
  • have an efficient and precise caching mechanism

A few of these goals are fulfilled simply by using Bazel correctly, instead of Colcon and CMake. Some other topics require a careful design that will support those goals (repo setup, message generation, ament plugin magic). Those designs are potentially in opposition to solutions currently in place in the ROS ecosystem

One critical point I see is the additional maintenance burden for repo maintainers, since they would have to maintain the build configuration twice, once for CMake and once for Bazel.

Kind regards
Kilian

6 Likes

This is my largest concern. I worked on a project that one point supported both CMake and Bazel. It did not go well; the different build systems had different bugs, and would produce slightly different artifacts. Imposing this on all ROS packages seems like it is going to lead to a lot of problems.

That doesn’t mean I wouldn’t consider it, but any REP that proposes adding a build system would have to propose some way to deal with this.

3 Likes

Another option: drake-ros/bazel_ros2_rules at main · RobotLocomotion/drake-ros · GitHub

To use this option you add a repository rule to your WORKSPACE file that points to a ROS workspace. This can be a system install of ROS like /opt/ros/<distro> or a tarball. Your targets can then depend on packages in that ROS workspace with labels like @ros2//:rclcpp_cc, or @ros2//:visualization_msgs_py, etc.

You don’t have to rewrite the build system for packages using ROS 2; however, if you want “reproducible” and “traceable” you’re responsible for building and hosting your own tarball somewhere. Further, the ROS packages are going to depend on system dependencies, which -if I understand correctly- isn’t “hermetic” or “traceable” in bazel.

1 Like

I’m a big fan of bazel - at Cruise we completely replaced catkin+cmake with bazel. The benefits of reproducibility are huge, and cloud based build execution/caching led to an order of magnitude improvement in developer productivity. But it was a massive undertaking, and my experience was that bazel does not play nicely with other build and packaging systems at all. To fully achieve the benefits we had to write bazel build rules for all our dependencies too.

That said, I don’t think ROS should support two parallel build systems - the maintenance burden of two build systems in an open source ecosystem that already struggles to find enough maintainers would be too high (e.g. kubernetes removed bazel as a secondary build system).

I’m also skeptical that bazel would be the right choice as the default build system for an open source framework/ecosystem such as ROS. In my experience bazel is pretty popular within companies large enough to support a dedicated build/CI team, but are there examples of successful non-Google open source libraries or frameworks (i.e. not applications) using bazel?

Finally, there isn’t much point writing a REP unless someone has a plan for how to staff such a project. It would require sweeping changes across hundreds of ROS repos, changes to the build farm, etc. Perhaps Apex and others would be willing to contribute, but it would be worth having that discussion up front before we bother debating implementation details.

10 Likes

I just want to echo Adrian’s point.

This conversation seems to pop up about every few months. What people don’t seem to realize is the amount of work required to transition the core ROS source code, build farm, infrastructure, and packages to a new build system. The ROS project is a big ship to steer, it can’t turn on a dime. We have to consider the needs of everyone: students, researchers, governments, and companies. All of these organizations would get dragged along if we were to transition the build system. Moreover, the costs of transitioning to Bazel in terms of time, developer time, resources, and opportunity costs are huge. I would estimate in the millions, if not tens of millions of dollars. Any proposal that doesn’t realistically address these costs is a non-starter.

4 Likes

As a “regular” ROS user and maintainer of a few (non-core) packages, I’d need very strong reasons to accept such change. Maintaining two concurrent build systems, one of which I don’t regularly use, is also not an option for me.

In what way are CMake builds not reproducible? Can you point to a writeup explaining what actual problems does it create for hobby/research/commercial projects?

2 Likes

Dear Adrian,

thank you for your comments.

I agree with you on a lot of points:

  • Adopting Bazel for ROS can have a hugely positive effect on developer productivity (both on the personal and also the team level)
  • Adopting Bazel for ROS is a huge project, and requires substantial investment from the team who wants to adopt it - both in time and effort.

At the same time, a lot of people and teams in our community are already trying to do it. With your example included, just in this thread, I can count 5 different attempts to somehow get Bazel to work well with ROS - and I think there are many more examples in the community.

I agree with you that Bazel would not be the right default build system for ROS, for many reasons.

But I do believe that an officially supported alternative Bazel-based build system would provide a lot of benefits to the part of the community, who are interested in using it:

  • it could significantly reduce the effort needed to introduce Bazel from a “DevOps Team” to a “DevOps Person”.
  • no need to reinvent the wheel in every team (e.g. how do we generate messages).

Regarding Kubernetes removing Bazel support: They could simply remove it because they no longer needed it. Kubernetes is almost exclusively written in Go, and Go has its own integrated build system, which now includes improvements that essentially nullified the advantages of using Bazel.

ROS is a very different beast, with many different languages, where C and C++ are used in the core.
It also has many different external dependencies, which need to be pulled into a single build system in order to have a fully controlled build environment - this matches your experience at Cruise. This complexity requires a build system that is designed for such complexity, like Bazel.

Regarding staffing: I agree with you, that adding Bazel support to every single ROS package is a huge undertaking. However, while this would be the ideal solution, we could start with much less and already offer a lot of benefits to anyone who wishes to use Bazel with ROS.

At the minimum we would need:

  • Bazel rules to build the core ROS components (e.g. rcl, rclcpp, rclpy, basic tools, the default RMW implementation, … etc.)
  • Support for message generation
  • An easy, documented way to add new packages and their dependencies

Many of these are already implemented in some of the listed projects.

We also need the solution to be modular and easy to customize for a team’s use case. E.g. it should be easy to remove stuff that a team does not need (e.g. RMW implementations they don’t use)

In my vision, these rules would reside in one or more separate repositories, just like the current external rules are hosted separately from the rest of ROS. This way no extra burden is placed on the package maintainers, who might not be interested in Bazel in the first place.

This point should also provide an answer to the “who will maintain this” question: those who want to use it. In my experience, participation in open-source projects is driven by interest in a topic and the need to have something one can use.

The core of my idea is to try to pool the resources who are already working on this problem but in their own sandbox. By working together we could come up with a template, and useful building blocks that everyone can use in their own projects as the common base.

A second phase could include a semi-automated way to generate Bazel rules for packages built with Ament / CMake. This would make it easy to keep the already converted ROS packages up-to-date and also to convert new packages. While this might seem far-fetched, we already solved a similar problem to generate Java bindings for the whole iOS API completely automatically in the Multi-OS Engine. There is already a limited example available for LLVM’s CMake files
and there is also Bazel Gazelle, just waiting for CMake support.

I hope that I could provide answers to most of your concerns. I am looking forward to continuing the discussion.

Kind regards,
Gergely

2 Likes

Dear Katherine,

thank you for your comment.

I think I already covered most of the concerns that you raise in my reply to Adrian:

  • In my vision, the Bazel support would be hosted separately, so no cost is paid by package maintainers and community members who don’t want to use it.
  • The goal is to pool the currently fragmented resources, who are already working on this problem.
  • We would start small, and provide immediate value to those who would like to use Bazel with ROS (e.g. Bazel rules for core packages and message generation + an easy-to-customize project template)
  • We would listen for feedback and add what is missing for most users.
  • We would Incrementally add more and more rules for packages that are of interest to the members of the community.
  • When there is enough momentum, we could invest in tools that make the maintenance of packages easier (e.g. semi-automatic Bazel rule generation from Ament + CMake files).

Maybe forming a community working group would be the best way to organize this work?

Kind regards,
Gergely

Dear Martin,

thank you for your comment.

I think I already addressed your main concern in my previous replies: you would not have to do anything if you are not interested in using Bazel.

Regarding reproducibility and CMake: If you just run CMake, by design it will crawl a lot of files on your machine looking for installed dependencies. It will also use the compiler that was installed on your system.

This means, that if I check out the same commit of your package, and build it on my system, the resulting binary will most likely be different (I might have different versions of the dependencies installed or even a different compiler).

This is usually not a problem in the general case, but a pretty big issue when we want to build a system where every build output has to be traced back to all of its inputs, and we want every build to be fully reproducible.

This is what Bazel makes much easier: it supports hermetic builds, which means, that every tool it uses is managed by Bazel and that it won’t touch (and will not allow any tool to touch) any files outside of the workspace that it controls.

Bazel also has a lot of other features that CMake does not, e.g. remote caching of build artifacts to speed up builds.

Kind regards,
Gergely

PS: It is, of course, possible to e.g. use APTLY to create a frozen Ubuntu repository, and then build a Docker container out of it, and use that container to build our packages. But this already requires substantial effort on our side, and we are hacking something together, instead of using a tool that was designed for this use case.

3 Likes

@kisg a community-maintained set of bazel rules for working with ROS, that does not require changes to existing repos, consolidating various efforts around this, sounds like a great idea.

Most of my criticism was regarding a wholesale adoption of bazel as a primary or secondary build tool within existing repos.

I don’t think it needs to be labeled “official” (I’m not aware of any such definition of “official” in ROS land anyway), it can simply be a project maintained by interested parties in the community.

2 Likes

Bazel sounds like a great build system! I think that the mention of generating Bazel rules from the existing rules is a good approach. A similar direction was taken by the folks at Clearpath, but they decided to use Nix to ensure reproducibility of the entire build, without switching out CMake1.

I’ll be happy to contribute, however little it is to a Bazel project!

1 Like

I’ve been curious about hermetic builds recently as I have a box that only has ubuntu 18.08 support from the vendor but I’d like to deploy Humble on it without resorting to containers.

One thing I have been looking at is specifying the toolchain to cmake which should be doable through colcon although I haven’t tried yet. As for finding system libraries over workspace ones that is definitely a problem, we’ve recently ran into the problem where the system yaml-cpp was picked up instead of the one from yaml_cpp_vendor despite the system package being the wrong version.

What I’m getting at here is that as attractive as Bazel is on paper, we might be able to get some (not all) of its main advantages without changing the whole build system.

Existing ROS build system and practices are not suitable for safety-critical applications. The original author has pointed it but I think it is important to reiterate again and again. I don’t want it missed here as these topics are a safety issue. Fortunately they come up regularly.

I hope the Bazel initiative gets some traction. Good luck!

1 Like

I suspect if ROS 2 started asking package maintainers to maintain Bazel build files for their packages, we would see fewer people willing to maintain packages in the ROS 2 ecosystem.

Writing BUILD files for a package with only first-party code isn’t too hard, but it gets very tricky when integrating third-party libraries like OpenCV, OGRE, ffmpeg etc. This is especially true if you want builds to be hermetic, because that eliminates some of the easy tricks like linking directly to libraries in /usr.

Existing ROS build system and practices are not suitable for safety-critical applications.

I think it would help to identify specific deficiencies here.

2 Likes

Dear @james-foxglove,

thank you for your insights.

suspect if ROS 2 started asking package maintainers to maintain Bazel build files for their packages, we would see fewer people willing to maintain packages in the ROS 2 ecosystem.

As it was already discussed above, this is not how this proposal is envisioned.

Writing BUILD files for a package with only first-party code isn’t too hard, but it gets very tricky when integrating third-party libraries like OpenCV, OGRE, ffmpeg etc. This is especially true if you want builds to be hermetic, because that eliminates some of the easy tricks like linking directly to libraries in /usr .

This is exactly the reason, why I think pooling resources together is so important: to also handle all the external dependencies of the more complex packages.

I also think that doing shortcuts like linking externally built libraries or even using the bazel_cmake ruleset to invoke an external CMake build from Bazel is the wrong direction: the former gives up the control that we want to achieve with Bazel, the latter adds complexity and unknown variables to an already complex problem.

Existing ROS build system and practices are not suitable for safety-critical applications.

I think it would help to identify specific deficiencies here.

I think control, traceability, and reproducibility are the key terms here. With Bazel, this is much easier to achieve due to its design than with a Colcon + CMake-based build system, especially if we throw Debian / Ubuntu packages and bloom-generate into the mix.

To be clear: I am not saying that it is not possible to create a reasonably controlled build environment with the current tools + some additional tooling, e.g. well-defined build containers where the Ubuntu package repositories are also frozen in time.

I think, however, that when a Bazel-based unified build system is properly set up for a project, it will perform better (in terms of build times and developer UX), be less dependent on external (OS level) tooling, and be less error-prone than the alternative.

It is also important to understand, that when a single system has complete understanding of the whole build system (which is not the case with Colcon + CMake - each package is built separately), then
a lot of other possibilities open up, e.g.:

  • SBOM generation - AFAIK soon to become a requirement by US authorities in certain
  • a new level of IDE and code analysis tooling integration.

Currently, this is only available to projects/teams, where the DevOps budget allows this sizable investment.

The goal of this proposal is to open up this world to smaller projects/startups/researchers, who have minimal DevOps resources (e.g. single person “DevOps Team”).

Kind regards,
Gergely

I added some clarification/summary to the original post based on the discussion here.

@amacneil By “official” I mostly meant what I wrote in the summary:

  • Create a community working group and a repository under the ROS umbrella, like there is a Safety working group.
  • Release the Bazel rules together with the ROS releases, e.g. proper tags/release branches in the repository that are expected to work with the matching ROS releases.

Hello,

I see that I don’t have to reiterate on benefits using Bazel. :slight_smile: I’ll just add it’s maybe not for everyone, some folks may be perfectly happy using precompiled apt packages. Just a side note: Bazel by default doesn’t solve everything: stuff from /usr can still sneak into your builds. But that can be solved with e.g. using a common dev Docker image within a team. Another alternative is to use a custom Bazel toolchain that works on a separate rootfs.

Handling some 3rd party packages directly with Bazel is tricky. GitHub - bazelbuild/rules_foreign_cc: Build rules for interfacing with "foreign" (non-Bazel) build systems (CMake, configure-make, GNU Make, boost, ninja) is there to help.

I am also not sure (as pointed out above) that supporting two build systems for ROS2 is feasible – e.g. maintaining Bazel rules in a bunch of repos. I am more pro a monorepo approach I started in my rules_ros2 repo and what Apex folks are developing.

Having a community working group for this is something I would definitely support. At least in my case that would help to set a direction, since I don’t have a concrete one yet. :slight_smile:

Cheers,
Milan

2 Likes