Announcing ZigROS! build ROS with the zig build system

Hello all,

I’d like to share a project I’ve been tinkering away at for a few months, ZigROS. ZigROS uses the zig toolchain to build rcl and rclcpp along with all their dependencies directly.

ZigROS is focused on creating static builds for simplified deployments. The Zig tooling makes this very straight forward. The zig toolchain supports static and dynamic linking with both arbitrary glibc versions and musl out of the box, as well as cross compilation.

ZigROS is capable of building the rclcpp stack in under 2 minutes on modern hardware, and producing statically linked binaries under 5MB that will run standalone on any linux systems.

The zig toolchain puts emphasis on being self contained, and ZigROS follows that ethos. The only dependency is the zig compiler, all other build and runtime dependencies (including python) are brought in and built.

Check out this example repo to see it in action.

Currently the project handles rcl, rclcpp, and all the needed message generation for interfaces. Only the base rclcpp is supported so far (no actions or lifecycle support just yet) but I do plan on continuing to improve feature support.

Given that the main goal of this project is ease of static builds, not all ROS projects will be compatible. For starters building anything with a runtime dependency on python or shared libraries won’t work. It also requires providing a build.zig file to build the project, though the API there is very straight forward.

I invite anyone curious about zig, or anyone interested in experimenting with static builds to give it a try. Setup is as straight forward as installing zig 0.13 and building the example repo.

For deployments to edge hardware like a raspberry pi or similar embedded linux system, the static linking and ease of cross compilation offered here could be a valuable alternative to the main build system.

Let me know if there are any questions or if you’re interested in collaborating.

Thanks!

6 Likes

This is really nice work, thanks for making it available.

It’s obviously also interesting in the context of the discussion around Nix.

ZigROS is capable of building the rclcpp stack in under 2 minutes on modern hardware, and producing statically linked binaries under 5MB that will run standalone on any linux systems.

The readme and design doc also state something like this (feature creep, massive workspaces, etc), which I thought deserves some nuance: even without something like Zig / ZigROS it’s possible to create minimal builds of ROS 2.

The binaries created by micro-ROS fi (also a single-target, static, no dependencies build) can be kilobytes in size. I personally maintain a build/dev environment which ends up creating a statically linked ROS 2 node in about 1 MB. Both of these are ‘only’ RCL(C) (and its/their dependencies), but it’s all just normal Colcon, vcs and vanilla cross-compiling GCC toolchains.

(it’s nice to see you’re using similar/the same approaches btw, such as with the RMW selection)

Having written that: reading the Zig documentation, it almost seems like the cross-compilation / toolchain management side to it is a nice-to-have tool, but it’s really about the language.

What’s your main interest in Zig?


Edit: looks like @jacob did some work on an rclzig: jacobperron/rclzig.

Can I use it to build ROS on Windows? Build from source on Windows is always not easy, I guess on Windows, the only supported c++ compiler is visual c++, maybe zig build is more gcc or clang?

is there any documentation on how to setup a repository do this? I see some CMAKE flags, but do we need to ensure this is done on all parent workspaces or does it work to enable it just in your nodes?

Does this mean Python nodes can take advantage of this build process to run against a ZigROS build of ROS?

This is really cool to see. I know that at one point, the work to de-couple the Zig toolchain from LLVM meant a planned regression on support for C++, Objective-C and a few different architectures. Do you know if C++ support is expected to remain in the Zig toolchain after they move away from LLVM?

Thanks for the comments!

I want to state up front that I don’t intend this effort to ever really be a replacement for the standard ROS tooling. It’s more to provide a Zig friendly experience, and to offer an alternative out of the box experience for those interested in minimized installs. The scope here is quite a bit smaller than what the nix discussion has going on.

This package was an off shoot of development while I was working on a zig client library. My main interest in zig is to show that it’s a useful language for robotics and to hopefully start to build a community around zigs application in robotics. My version of an rclzig wrapper will be released in the near future. ZigROS was originally my way of making developing rclzig easier, but I figured there may be other folks interested in the build system side of things so opted to release this as separate functionality.

My understanding is that micro ROS is targeting RTOS’s and isn’t using the rclcpp client libraries or standard RMW implementations that are also C++ but correct me if I’m wrong here. A lot of that size comes from rclcpp, if you link against just rcl you can at least halve the size. My current rclzig builds for example which don’t bring in rclcpp come out to around 1.8MB when built statically.

My comparisons to large workspaces and dependency creep in the documentation is for the “out of the box” experience of ROS (the build from source documentation ROS provides pulls in this workspace that builds several RMWs, python packages, rqt/rviz, etc for example), I’m aware that you can tailor this experience if you know the details of the ROS build system. My thoughts for ZigROS is to provide a better “out of the box” experience for making minimal deployments. I could clarify this in the documentation if you think whats there currently comes off as a dig on the ROS tools, that’s not my intent. Would you be willing to share your statically linked colcon workflow? I wasn’t able to find much info on this when doing my research.

I spoke to William Woodall at roscon 2023 about how zig could integrate its allocator interface with the rcl allocator abstraction. He mentioned Jacobs work on a zig wrapper there too but it’s been inactive for a while. I hope to clean up a few rough edges on my rclzig implementation shortly and will be switching to “developing it in the open” to attract new interest to the project.

In theory this would allow for simple windows builds as well, however there’s a few places in the code base where windows support hasn’t been added (or if it has been added it is untested). I initially tried to support windows but ran into some weird edge cases doing cross compilation from linux. At the time I didn’t have a windows machine but I recently picked one up so I may go back to see if I can build windows again but that’s pretty low on my radar.

Zig wraps LLVM/Clang for C/C++ builds. They do support windows via MinGW when cross compiling, or Windows via MSVC or MinGW if compiling natively on a windows machine (licensing constraints from microsoft prevent them from shipping msvc support generically is my understanding)

Right now python is limited to the compile time requirements for code generation. To achieve this zig builds its own version of the base cpython, then the required python dependencies are brought in. This works okay for the compile time stuff since there’s only two external dependencies that are both pure python (lark and empy).

In theory there’s nothing stopping us from applying the same strategy for building the run time python pacakages in this way as well and installing the zig built cpython, but I imagine getting all the run time dependencies in order would be a very challenging task. This is why I consider run time python out of scope for the time being.

Zig will be decoupling itself from llvm for zig compilation, this is true. However Andrew has committed to providing a path forward for existing workflows and will be rolling C and C++ builds into their own projects that can still be brought in by the zig build system. I’m confident that Andrew will make good on his promises here as there’s a large portion of the zig user base that’s only here for the amazing tooling, including uber, one of zigs largest sources of income in the past.

Wow that comment is from 2023. I didn’t realize just how out-of-date my knowledge of the toolchain progress had gotten. Thanks for sharing!

If you are interested, we could allocate some time in a future Infrastructure PMC meeting (probably February or March based on our current activity) to discuss how this project came about. In particular I’m curious what can be done to help maintainers of alternative build systems like this and Nix OS other than upstreaming, which whether ultimately positive or negative would absolutely be disruptive, and would allow these projects to flourish alongside the officially supported build toolchain.

Yeah the whole removing LLVM stuff has been a long time coming. There is already a mostly usable LLVMless option with a handful of backends which is cool to see. It compiles incredibly fast.

I could be interested in attending to share some insights and background if you think it’d be valuable. Is there interest from the infrastructure team in learning more about zig tooling? I’ll be honest I don’t have many qualms with the infra side of things, I can’t think of much I’d need to continue with this experiment along side the current infrastructure.

A loftier infrastructure goal that I think could be interesting would be getting zig builds as an option with colcon. Packaging the zig compiler is trivial. This has crossed my mind if my rclzig project manages to get off the ground for supporting ROS packages written in zig, but it could also be used for building c/c++ packages as well.

I do have some concerns with how much ROS projects love dlopen and how that’ll limit the static build aspect of this effort but I doubt that’s the right team to advocate for continued/improved support of static builds.

The dlopen pattern is used quite a bit because of the federated ecosystem. For generic building/packaging with things like apt, it’s helpful to defer some of the decisions to runtime.

That being said, I don’t think that anyone would be opposed to making ROS (or at least up to rclpy/rclcpp) much more amenable to statically-linking, it’s just a function of doing the work. Because of some of the other threads, I recently started toying around with a bazel build of ROS (hacky prototype) and quickly ran into some of the same issues around dlopen, as bazel really likes to statically link as well.

Ideally, we could make it a compile-time switch that is applied consistently across the stack.

Yeah I can appreciate the usefulness of runtime configuration for packaging, tooling, and when starting out. The added flexibility and ability to build and experiment quickly is an important aspect of ROS in my opinion. Where I think things could be improved is streamlining the ability to lock in your system once you’ve sorted out what you’re building.

I think how the RMWs are handled is actually a really good example of this. Out of the box the RMW provides a shim that loads your target RMW at run time. Once you’ve figured out the RMW that makes sense you can drop the shim and link against your RMW of choice directly (statically or dynamically). The typesupport situation on the other hand generates a shim that you can’t really work around, and if you end up wanting to put together a system that requires multi type support you’re out of luck for static linking.

Really when it comes to core libraries though as long as you stick to a single typesupport getting static linking working isn’t too bad. I’d like to see that use case continue to be supported, at least for packages used when deploying. Tooling that isn’t run directly on the bot during operation is far less critical in my view.

The typesupport situation on the other hand generates a shim that you can’t really work around, and if you end up wanting to put together a system that requires multi type support you’re out of luck for static linking.

For C++ application, this isn’t necessarily true so long as you don’t want to use GenericSubscription/GenericPublisher. It’s already required to link against the shim library for each message type that your application needs to use, so also linking against the RMW-specific typesupport library wouldn’t be much of a stretch (or just linking the typesupport library directly into the shim).

Unfortunately it is true, even if you only use typed pubs and subs (the generic interfaces call dlopen on their own so they’re not static friendly regardless of your typesupport situation.)

Have a look at the generated output from rosidl when multiple type supports are used. This is taken from the builtin interfaces time type for C++ messages with the default dual typesupports you get out of the box (fastrtps and introspection):

// generated from rosidl_typesupport_cpp/resource/idl__type_support.cpp.em
// with input from builtin_interfaces:msg/Time.idl
// generated code does not contain a copyright notice

#include "cstddef"
#include "rosidl_runtime_c/message_type_support_struct.h"
#include "builtin_interfaces/msg/detail/time__functions.h"
#include "builtin_interfaces/msg/detail/time__struct.hpp"
#include "rosidl_typesupport_cpp/identifier.hpp"
#include "rosidl_typesupport_cpp/message_type_support.hpp"
#include "rosidl_typesupport_c/type_support_map.h"
#include "rosidl_typesupport_cpp/message_type_support_dispatch.hpp"
#include "rosidl_typesupport_cpp/visibility_control.h"
#include "rosidl_typesupport_interface/macros.h"

namespace builtin_interfaces
{

namespace msg
{

namespace rosidl_typesupport_cpp
{

typedef struct _Time_type_support_ids_t
{
  const char * typesupport_identifier[2];
} _Time_type_support_ids_t;

static const _Time_type_support_ids_t _Time_message_typesupport_ids = {
  {
    "rosidl_typesupport_fastrtps_cpp",  // ::rosidl_typesupport_fastrtps_cpp::typesupport_identifier,
    "rosidl_typesupport_introspection_cpp",  // ::rosidl_typesupport_introspection_cpp::typesupport_identifier,
  }
};

typedef struct _Time_type_support_symbol_names_t
{
  const char * symbol_name[2];
} _Time_type_support_symbol_names_t;

#define STRINGIFY_(s) #s
#define STRINGIFY(s) STRINGIFY_(s)

static const _Time_type_support_symbol_names_t _Time_message_typesupport_symbol_names = {
  {
    STRINGIFY(ROSIDL_TYPESUPPORT_INTERFACE__MESSAGE_SYMBOL_NAME(rosidl_typesupport_fastrtps_cpp, builtin_interfaces, msg, Time)),
    STRINGIFY(ROSIDL_TYPESUPPORT_INTERFACE__MESSAGE_SYMBOL_NAME(rosidl_typesupport_introspection_cpp, builtin_interfaces, msg, Time)),
  }
};

typedef struct _Time_type_support_data_t
{
  void * data[2];
} _Time_type_support_data_t;

static _Time_type_support_data_t _Time_message_typesupport_data = {
  {
    0,  // will store the shared library later
    0,  // will store the shared library later
  }
};

static const type_support_map_t _Time_message_typesupport_map = {
  2,
  "builtin_interfaces",
  &_Time_message_typesupport_ids.typesupport_identifier[0],
  &_Time_message_typesupport_symbol_names.symbol_name[0],
  &_Time_message_typesupport_data.data[0],
};

static const rosidl_message_type_support_t Time_message_type_support_handle = {
  ::rosidl_typesupport_cpp::typesupport_identifier,
  reinterpret_cast<const type_support_map_t *>(&_Time_message_typesupport_map),
  ::rosidl_typesupport_cpp::get_message_typesupport_handle_function,
  &builtin_interfaces__msg__Time__get_type_hash,
  &builtin_interfaces__msg__Time__get_type_description,
  &builtin_interfaces__msg__Time__get_type_description_sources,
};

}  // namespace rosidl_typesupport_cpp

}  // namespace msg

}  // namespace builtin_interfaces

namespace rosidl_typesupport_cpp
{

template<>
ROSIDL_TYPESUPPORT_CPP_PUBLIC
const rosidl_message_type_support_t *
get_message_type_support_handle<builtin_interfaces::msg::Time>()
{
  return &::builtin_interfaces::msg::rosidl_typesupport_cpp::Time_message_type_support_handle;
}

#ifdef __cplusplus
extern "C"
{
#endif

ROSIDL_TYPESUPPORT_CPP_PUBLIC
const rosidl_message_type_support_t *
ROSIDL_TYPESUPPORT_INTERFACE__MESSAGE_SYMBOL_NAME(rosidl_typesupport_cpp, builtin_interfaces, msg, Time)() {
  return get_message_type_support_handle<builtin_interfaces::msg::Time>();
}

#ifdef __cplusplus
}
#endif
}  // namespace rosidl_typesupport_cpp

of note, notice the call to get_message_typesupport_handle_function which comes from type support dispatch and calls dlopen.

In the case of typesupports, if more than one is required typesupport_cpp opts to generate this dynamic loading version. rclcpp only works with the typesupport_cpp shim. Unlike the RMW layer, typesupports all have unique symbols, so you need to use the shim to go from typesupport_cpp to your typesupport of choice. To fix the static linking for multi type support the typesupport_cpp generator would need to be rewritten.

Here you can see the unique symbols from each type support:

# nm -gDC ./libbuiltin_interfaces__rosidl_typesupport_cpp.so | grep get_message_type_support_handle
0000000000001184 T rosidl_message_type_support_t const* rosidl_typesupport_cpp::get_message_type_support_handle<builtin_interfaces::msg::Time_<std::allocator<void> > >()
0000000000001139 T rosidl_message_type_support_t const* rosidl_typesupport_cpp::get_message_type_support_handle<builtin_interfaces::msg::Duration_<std::allocator<void> > >()
000000000000114a T rosidl_typesupport_cpp__get_message_type_support_handle__builtin_interfaces__msg__Duration
0000000000001195 T rosidl_typesupport_cpp__get_message_type_support_handle__builtin_interfaces__msg__Time

# nm -gDC ./libbuiltin_interfaces__rosidl_typesupport_introspection_cpp.so | grep get_message_type_support_handle
000000000000228b T rosidl_message_type_support_t const* rosidl_typesupport_introspection_cpp::get_message_type_support_handle<builtin_interfaces::msg::Time_<std::allocator<void> > >()
00000000000021a7 T rosidl_message_type_support_t const* rosidl_typesupport_introspection_cpp::get_message_type_support_handle<builtin_interfaces::msg::Duration_<std::allocator<void> > >()
00000000000021b8 T rosidl_typesupport_introspection_cpp__get_message_type_support_handle__builtin_interfaces__msg__Duration
000000000000229c T rosidl_typesupport_introspection_cpp__get_message_type_support_handle__builtin_interfaces__msg__Time

# nm -gDC ./libbuiltin_interfaces__rosidl_typesupport_fastrtps_cpp.so | grep get_message_type_support_handle
00000000000032b9 T rosidl_message_type_support_t const* rosidl_typesupport_fastrtps_cpp::get_message_type_support_handle<builtin_interfaces::msg::Time_<std::allocator<void> > >()
00000000000028b1 T rosidl_message_type_support_t const* rosidl_typesupport_fastrtps_cpp::get_message_type_support_handle<builtin_interfaces::msg::Duration_<std::allocator<void> > >()
00000000000028c2 T rosidl_typesupport_fastrtps_cpp__get_message_type_support_handle__builtin_interfaces__msg__Duration
00000000000032ca T rosidl_typesupport_fastrtps_cpp__get_message_type_support_handle__builtin_interfaces__msg__Time

This is why you can’t remove the shim, and the way the shim is implemented isn’t able to be statically linked when more than one type support is required.

And for contrast here you can see why the runtime loader for the RMWs can be bypassed so easily:

# nm -gDC /opt/ros/jazzy/lib/librmw_implementation.so | grep rmw_create_node
0000000000005040 T rmw_create_node

# nm -gDC /opt/ros/jazzy/lib/librmw_fastrtps_cpp.so | grep rmw_create_node
0000000000026c30 T rmw_create_node

# nm -gDC /opt/ros/jazzy/lib/librmw_cyclonedds_cpp.so | grep rmw_create_node
0000000000023fb0 T rmw_create_node