Build Systems, Package Management, and Nix

Dude, really? You can’t be serious, and you still somehow find a way to complain about people not being nice to you…

Same radio silence as every time people ask you to elaborate a proposal for one of the million problems you see in ROS. Will you reply to @peci1 with just one rough draft of a proposal?

4 Likes

Now you wonder why you do not get proposals. You can add your ideas and reasoning why ROS does not get any REPs, not just from me but from the entire ROS community as recently asked by @gbiggs. This is one of two things done by the OSRF in the last eight months:

Should I give that thread a little publicity, too?

But here, please help @AngleSideAngle by using your experience in Rust and Cargo instead of replying to me and making non-funny jokes.

@doganulus What is your proposal? We can’t read your mind. Just propose something so that everyone can contribute, that’s how engineers work, not with vague statements and platitudes. We’ve been asking you every time you complain about something since you joined this community, and you continue deflecting

4 Likes

Correct, Nix is core to the OTTO build system, and has been for a couple years now. We wrap colcon in areas where that’s relevant, and we only use NixOS for our Hydra server.

Per one of our team members:

The generation step performed by [an internal custom] package takes an already existing snapshot in rosdistro (so all the repositories are represented by a git hash) and then generates the package files and the nix files that specify the source of each package at this particular hash. Besides generating the nix files for individual packages, this also creates a flake.nix file, which serves as the main entry point to performing the builds in nix. The files generated from this step are then pushed to the nix flake repository and given a tag.

As for ‘why didn’t we go all out’, we didn’t find any technical reasons why it couldn’t be done, but we didn’t and haven’t looked that hard due to more general engineering reasons. A few things we considered:

  1. General cost/benefit/risk of a large system migration (no matter the technology) vs other software and infrastructure investments we could be making
  2. Familiarity of Nix vs. alternatives among our team and the general ROS/software development community
  3. Support of Nix by the various other tools in our CI system (not just build tools, but analysis tools, audit tools and the like)

All of this is to say that I’m not aware of any technical reason your ambitious idea might not work.

5 Likes

Thank you, Ryan.

Indeed, the main problem is not technical but organizational, as I said in the beginning. Maybe Clearpath would come up with a generalization of their approach and prepare a proposal that cannot be arbitrarily refused by the Foundation committees. It seems you are close to something useful, and the community passionately wants some proposals, according to messages in this thread.

Unfortunately, the current Kilted roadmap looks so empty. So together with Nvidia’s Bazel proposal, I hope your efforts would make rclcpp to be built using Nix, Bazel, and plain ament-free Cmake.

Please separate runtime and develop artifacts, too as any credible library does.

Yes, that is roughly what I am proposing. To clarify one thing we’re thinking differently about, I don’t believe ROS was designed to be build system agnostic. package.xml is a standard that is only used by ros-specific build tools, so parsing nix expressions to port packages over to other package distribution methods would be logically equivalent to what is currently done using superflore and metadata within package.xml files.

And yes, ros-nix-overlay currently is built using superflore and the data within package.xml files. If ROS was distributed primarily through pixi/conda, ros-nix-overlay could be built the same way by parsing pixi.toml files. Admittedly parsing nix is a bit harder than xml or toml because nix is turing complete, so you’d probably need to write a function in nix that returns’ a set of all a package’s dependencies and write them somewhere.

1 Like

If you collect 10 random packages distributed from a ros nix repository, then you would not end up with 10 different versions of all the libraries they depend on. On the other hand, if you collect 10 random packages from 10 different commits of a ros nix repository, then you would enter the described predicament. Note that the latter offers much more resistance.

Conda and nix have different ways of handling version management because conda focuses a lot more on package versions. For example, if I want to install an old version of gcc from conda-forge, I’d install the gcc package with a specific version tag. On nix, I’d have the choice between (all up to date in main) gcc14, gcc13, gcc12, etc. This means the only reason a nix flake needs to depend on an older version of the nixpkgs repo is if it needs something that is unmaintained and removed from nixpkgs, at which point security patches aren’t expected anyway.

Since you probably aren’t convinced this is practical without extreme maintenance burden, here’s a cool graph that demonstrates that nixpkgs both has the most packages and the most up-to-date packages of any packaging repository, despite it being volunteer driven. This is in part due to a tool called hydra, which seems to be roughly equivalent to superflore. It handles CI, rebuilding all packages when a dependency is updated, and ensuring no changes cause existing builds to fail.

That’s not what I’m talking about. I’m talking about downloading github.com/CoolResearcherA/BestSLAM, github.com/PlanXYZ/IROS_PLanner_Release and gitee.cc/xingu/tiang_exploration (I’ve fantasized the names, but you get the gist, right?).

Current colcon (and the older catkin) build tools allow to just put these into a workspace, run rosdep and build the workspace. If you’re lucky, the fact that each was written for a different version of ROS doesn’t matter. If you’re not, you’ll have to start fixing the API changes. However, none of these packages prescribes a particular version of anything (which is very bad for reproducibility, but very good for people who just want to try it with their own ROS release).

Whereas, IIUC, the proposed Nixification would lead to the state where BestSLAM pins GCC 4.3, IROS_PLanner_Release pins GCC 8.3.1 and tiang_exploration pins GCC 9.0.2 . And with 99% chance, they do not pin these versions because the authors know that their code needs something from the particular versions, which would work neither in older, nor in newer version. They pin it to these versions not knowing anything about compiler versions (because they’re students), and the particular versions are selected just because they were “default”. But they are fixed forever, because hey, it works.

For me as a researcher, it is easier to just try building the package on current Ubuntu and banging it until it compiles, than figuring out that a package built with GCC 4.3 is incompatible with another GCC 9 package, and trying to find some common version that would work for both. But hey, when I change the version of GCC, I also need to change versions of other things, don’t I? And the waterfall begins… Or not?

Assuming each of these imaginary repositories are packaged as a flake, and not part of the hypothetical nixified ros distribution, then yes, they would depend on different versions of gcc.

Let’s say BestSLAM depends on CUDA 10, which supports Ubuntu 14, 16, and 18. You’re using ROS Jazzy because you want to have the latest ROS features, which supports Ubuntu 22 and 24. I’d much prefer getting GCC versions duplicated and having a guarantee that BestSLAM will work with the rest of my code first try, than being forced to go through BestSLAM and update a bunch of calls that broke between CUDA 10 and 12. This may not be a hypothetical…

One thing that’s important to understand is that nix packages depend on specific versions of their dependencies, as enforced by hashing of their source code. This means I could have two (or more) versions of gcc on the same system and each package would depend on the correct version. This is accomplished by storing packages in /nix/store by their unique hashes, meaning every actual reference to the packages is handled by nix via simlinking or adding binaries to path.

Thanks, now I have a better idea how do things work with Nix. But a few things are still not clear to me.

When packages specify exactly what versions they depend on, how do the maintainers figure they should update a dependency? Isn’t this the extra burden?

Currently, as package maintainer, I don’t care about versions of packages I depend on (unless there’s a critical bug fix etc). I just depend on the name and hope that other maintainers are similarly sane to not break API in a package released to a distro. If my dependencies need to be updated, they just do and I don’t care unless the buildfarm winks at me.

What does the process of updating to the latest versions of my dependencies look like in Nix world? Can something automate that?

Also, back to the cuda example. I think your example only works if BestSLAM only provides ros API. If I wanted to link to its library instead, there’d be a clash of cuda versions during linking. Or maybe even earlier: nix should tell me that a transitive dependency clashes with my direct one.

Yes, that is roughly what I am proposing.

Thanks, that is much more clear.

I really appreciate the boldness, but I guess the only realistic way in which you can reach consensus on moving from package.xml to a turing-complete nix-specific DSL is that the vast majority of ROS users switch to use nix. Note that “reaching consensus” also means convince people/companies in the ROS community to contribute work and/or fund work in that direction.

So I guess that improving documentation on how to already use ROS with Nix right now is both useful in the short term, and a prerequisite to convince everyone in the ROS community that they should stop using package.xml.

To clarify one thing we’re thinking differently about, I don’t believe ROS was designed to be build system agnostic. package.xml is a standard that is only used by ros-specific build tools,

I am not sure I got what you were trying to convey he. In particular, it is not clear what you mean with “build system” and “build tools”. Unfortunately there are not universal definition for these terms, but if we use the definitions mentioned in the article A universal build tool (that you already linked), we could define more and less as:

  • build system: what is used to transform a source code of a package to a version of the package that can be used, like cmake, meson, bazel, autotools, setuptools
  • build tools: tool that is used to take many package, and call their build system to ensure that the packages can be used. This is what colcon is, but also build systems such as CMake (via FetchContent or ExternalProject) or bazel or other systems like nix or Debian or Red Hat software to build apt or dnf packages can also fill that role when combined with bloom or superflore.

If we agree with these definition, I am not sure what you mean with “I don’t believe ROS was designed to be build system agnostic”. “build systems” do not consume package.xml files, so I am not sure why build system play a role here.

so parsing nix expressions to port packages over to other package distribution methods would be logically equivalent to what is currently done using superflore and metadata within package.xml files. And yes, ros-nix-overlay currently is built using superflore and the data within package.xml files. Admittedly parsing nix is a bit harder than xml or toml because nix is turing complete, so you’d probably need to write a function in nix that returns’ a set of all a package’s dependencies and write them somewhere.

“logically” equivalent does not mean that they are “pratically” equivalent. As you correctly point out, extracting metadata from turing complete DSL that require custom parsing is not exactly practical as just reading them from a simple package.xml. Furthermore, not controlling the abstraction but using an existing format is problematic also as it is tricky to extend. Let’s say I want to declare a dependency on a package that is not available in nixos ? With package.xml/rosdep it is possible to define a custom key that is only resolved on a single distribution. Clearly, I can imagine you can define somehow find hacks and/or extensions to permit to do that on nix flakes, but again I doubt you can find consensus on that until a large part of ROS users switches to use nix.

not necessarily I believe.

The migration from rosbuild to Catkin was done incrementally, with packages migrating after their dependencies had migrated.

I could see a similar thing perhaps also working for what @AngleSideAngle is suggesting – but admittedly I’m not sure whether we could have non-Nix packages depend on already Nix-i-fied dependencies.

1 Like

Hi,

That’s an interesting (lengthy) discussion right there and I’m just chiming in to genuinely ask, what is the problem we’re trying to solve here?
I think that got lost in translation somewhere in the nitpick technical details. Going back to the initial post, the following is explicited:

But that’s about it.
The current tooling could certainly be improved and it’s usability simplified but I fail to understand what does it prevent to achieve?

In my own experience (which is necessarily limited), dependency hell more often than not stems from package devs who one way or another mess the package.xml (e.g. missing dep, wrong dep type, dep that doesn’t exist in rosdep, declaring boost a.k.a libboost-all-dev when they really only need libboost-foo…). Does Nix prevent that somehow ?

2 Likes

We can ask Nvidia devs why they didn’t declare a CUDA dependency to their curobo library, apparently using CUDA.

Nope, isaac_common_ros does not include it either:

Is there any Nvidia developer here? What prevents you from declaring CUDA as a dependency using this great tooling?

That’s actually a good question. I mean, cuda is in rosdistro.
So, case in point?

Edit: it isn’t declared in isaac_ros_common either.

There is nothing stopping someone from using the cuda key in rosdistro.

Typically when dependencies aren’t specified correctly in package.xml, they will manifest in one of two ways:

  1. Users find that they are missing dependencies when they go to build
  2. The package fails to build for any of the binary platforms that rely on the package.xml specification (deb, rpm, conda, yocto, etc)

Since NVidia has their own installation procedure for IssacSim packages, they kind of take care of (1) for the user.

For (2) These packages have also never been released through the normal binary mechanisms for one reason or another. My best guess is that they would fail to build binary artifacts, though.

1 Like

Any chance that dependency management isn’t as simplistic as you portray? Especially for complex robotic software.

Can you describe every software in the world in the 10000 line of mega YAML files? or 100000?

Socratic method is the best to reveal correct answers…

1 Like

I’m not claiming that it’s simple, just that if you incorrectly specify dependencies, then it may cause failures at either build time or packaging time. This is sort of a universal issue that can appear in any buildsystem.

I’m definitely interested in alternative approaches to dependency management, which is why I have been such a big cheerleader for @traversaro and the rest of the Robostack team. It represents a different way of thinking about how we manage dependencies and is already improving workflows around development and CI. Perhaps Nix-based dependency resolution and builds could also provide value, I know that our friends at Clearpath are already using it quite heavily.

Having done a good chunk of bazel porting, I would also be interested to see an approach that used bzlmod and the bazel central registry. Of course, that doesn’t get you away from a giant repo of yaml files (though bcr uses json). It seems that the bazel+robotics community is still small and hasn’t quite gained the critical mass it requires to maintain all of the dependencies. I think the biggest hang-up for bazel is getting someone to commit to doing the work and maintaining it.

7 Likes

This thread kind of blew up and I simultaneously got rather busy. I discussed the topic a bit elsewhere with @mjcarroll and have decided the best way to go forward is for me to work on solving the hurdles with ergonomically using nix to build a ros project (containing projects that are built with ament, setuptools, nix, etc) and publicly release a demonstration. It’s possible this will be tested on an autonomous vehicle before the end of the year.

Edit: adding the following because I think it’s relevant

The reason a better build or dependency management system is necessary, whether it’s nix, bazel, or conda, is that rosdep on top of Ubuntu is not guaranteed to be able to install a package’s dependencies and has no way of resolving version conflicts. This makes it unusable for some projects. I believe nix specifically has numerous benefits (as outlined in the previous posts), and intend to make a demonstration for its use, as something tangible to play around with would help with demonstrations and discussions.

6 Likes

rosdep on top of Ubuntu is not guaranteed to be able to install a package’s dependencies

As long as the dep is declared and exists in rosdistro for a given distro then it should be installable on said distro. I can’t imagine Nix (or any other deps managers) being able to resolve undeclared / unlisted dep.

and has no way of resolving version conflicts

I’m not seasoned enough in Nix to appreciate all the subtleties and thus grasp its magic in resolving version conflicts in projects such as ROS which relies for a large part on messaging and dynamic library loading. I’m thus looking forward to seeing the demo and learn :+1: .