We could do it with submodules, but there are three disadvantages of this.
It only works for Git repositories. One of our long-term goals is to support Autoware as a distribution, not a single project. This is to work with external algorithm nodes, for example. This means supporting packages from repositories that come from outside the project, and those outside packages could potentially use a non-git RCS, which would make using them as submodules impossible.
It is easier to provide different versions of Autoware’s collection of repositories using vcs than submodules, because vcs works on a text-based input file. We can distribute various different configuration files for different versions of Autoware based on different needs. Doing this with submodules is really hard.
Submodules and branches do not play nicely together.
vcs also tends to work well with the existing ROS package management approach.
I think that submodules are best for the situation where you want to treat an external project as part of your project and work the two pieces of code in lock-step, as if they are one. That is not what we are trying to do here.
@gbiggs This is a good idea! Thanks for facilitating the discussion.
@amc-nu We can reflect the spreadsheet we prepared for to this reorganization. What we defined as modules there can be separate respoistories but @gbiggs also thinks that too many respositories are not suited.
Perhaps I’m going off-topic now but, there’s no recommended sensor suite or general arrangement of sensors that TierIV uses on their cars (and has therefore been tested to work)?
I’m curious how a new user is supposed to know what sensors to buy or how to arrange them on the vehicle to use Autoware somewhat safely.
Some thoughts about the two directions of having a single repository containing all packages as well as having separate repositories for each package come. Each comes with its own pros and cons. (Obviously any in between is also possible - a few repos with a few packages each.)
The main characteristics for a single repo - let’s start with the positive:
A single issue tracker makes it clear where to create / search for tickets.
A single PR is sufficient for any kind of change.
All packages should be consistent with each other in every single commit.
Making a release requires only a single tag (and in case of ROS a bloom invocation).
On the other hand the downsides of a monolithic repo are:
Access can’t be granted on a granular level - either a contributor has write access to everything or nothing.
CI is probably challenging to setup if you don’t want to build everything on each pull request even if it only touches a specific subset of the repo.
If the amount of issues / pull requests grows with the community having all in one place might be overwhelming.
Since releases cover the whole repo (at least with bloom in its current form that is) you have to release new version of packages which actually haven’t changed in order to ship changes in other packages.
Since naturally the pros of one option are the cons of the other I will only mention different aspects for the separate repository option:
Users / developers only fetch the code parts they actually want to modify and get binary releases of other packages which significantly reduces the build time.
Naturally your CI will be much more efficient by limiting the testing to the actually affected subset of packages.
You will need a way to test changesets which span across repository boundaries. For ROS 2 we commonly run jobs building multiple repos while use the same custom branch in them for related changes.
Managing multiple repositories does take more effort. That being said tools like vcs and hub allow you to manage multiple repositories pretty effectively - from rebasing multiple repos to creating multiple PRs at the same same for separate repos.
Some real world examples for the different options:
ros_comm: containing 32 packages. Currently the desire is to split this repo up. E.g. looking at the ridiculous releases / changelogs of some packages.
colcon: each repo containing only a single Python package. While certainly on the extreme side it nicely helps enforcing modularity and each package has its own release cycle and can follow semver without being affected by unrelated code.
The ROS 2 repositories fall somewhere into the middle between these two extreme options.
Some packages are tightly coupled and therefore grouped into a single repo to avoid the cons of separate repos and since they are versioned / released together.
Separate subsystems are still separated into different repositories. E.g. the command line tools in ros2cli don’t have to be in the same repo as the simulation related packages. In these cases the goal to develop / maintain / release them individually (and grant different people access) was weighted more important.
This approach also makes it easier to accept new packages and still put them under the “official” ros2 org unit without having to integrate them into a monolithic repository (which would increase the effort for building / testing / CI / maintenance continuously).
Clear one approach doesn’t fit everyone’s needs. So good luck finding the right balance for your use case.
I think we are aiming for a middle-of-the-road approach with the number of repositories we are thinking of. I think the balance of workload increase and benefits will work out, especially with the tools we have to manage multiple repositories at once.
Note: I do not really understand what benefit will splitting and re-organizing of Autoware.AI repo have. Since the plan is to transform it into a sandbox in 12-18 months from now, I really do not think that we spend our resources here very wisely.
However since this topic will also be applicable for Autoware.Auto and since at Apex we have 220 packages in a monolithic repository and everybody loves it - I will provide an input.
You also need:
What I suggest is to have above as subfolders inside the autoware root folder of a monolithic repo.
@sgermanserrano I do not fully understand how are a repository structure and where you install/run Autoware nodes connected? As it is mentioned further down, you can easily create binaries for just subparts of the repo (e.g. visualization) and install it on the same or different machine than e.g. real-time nodes.
Do we really plan to write our own visualization code and not use e.g. rviz, xviz, rqt_plot, etc.? With that all that we would need to save are visualization config files.
This is a) certainly not applicable to Autoware.AI and b) whether a package is safety-critical or not could also be asserted via a cmake macro and depending on that you can build which CI rules and checks apply to that particular package or not.
@amc-nu all of your requests can be done in one repo. I claim that it is actually even easier to design and define cleaner and minimal interfaces. For testing that no intended dependencies creep over colcon build builds the packages in an isolated mode.
I would also add:
you do not need a feature and code freeze when doing releases, these can both be one
single entry point for developer, all work for developers is in one place
much easier to co-host documentation and code in one repo and actually make checks against changed code (e.g. changed executable names, APIs, …)
It will aide with the transition between Autoware.AI and Autoware.Auto, although it is not essential for this.
Part of perception, as I understand it.
Part of planning.
Yes, this is potentially necessary.
Depends on what is in there. I’d prefer not to be managing someone else’s code if we can find a better way to use it.
Not to my knowledge, but I wouldn’t be surprised to see, for example, custom renderers for custom data.
I don’t think the repository organisation affects how minimal the interfaces are. I do think that if interfaces are in a separate repository it is mentally enforced that interface declaration is separate from implementation, but I wouldn’t separate the repositories solely for this.
I think this applies to any number of repositories. It’s a function of the branching model.
Achievable with an organisation, although I agree it is not as straight-forward.
If we split the repository, my goal is that each can be treated as an individual black-box unit with its own API and documentation.
Can you provide some examples? I believe it is possible to achieve traceability across multiple repositories as well, but I am concerned about the amount of manual work that may be involved. It’s also possible that much of this may be a tooling problem (especially the CI).
I don’t think this really changes if you have one or multiple repositories.
Organisations fix this, too. Plus as @dirk-thomas said if you want to restrict permissions to just part of the code then it is easier with multiple repositories.
Yes, as I said above I want to get nightly binary building going eventually. I don’t see this as an argument either way, but it is easier to do with split repositories if you can treat each repository as a single unit.
The rest of my life just became startlingly clear.
On the other hand, with 131 packages releasing all of them individually would suck so splitting repositories and then being able to several repositories is nice.
Please do so, it will contribute to the discussion and help us make a better-informed decision.
You should try running bloom on a repo with that many packages. You will be waiting a pretty long time to do that one release… (I am every time bothered by the time it takes to do a release of ros_comm which has less than a quarter of the packages.)
And if you then notice that one of the 131 packages needs a bug fix release for a single line change but you have to release all 131 packages again (each except one with an empty changelog). And every user will have to download 131 new Debian packages
Of course you could tweak bloom to support releasing subsets of packages from a single repo. But realistically is anyone interested to put the effort into doing that?
@Dejan_Pangercic the comment was a consideration to have when considering splitting the repository. But as it is today if I want to run Autoware in a Synquacer (which is Arm based) and have Rviz running in x86 I would need 2 full copies of Autoware that need to be compiled separately.
A split repository would mean a higher degree of control as what to compile and install in a particular machine, without a deeper knowledge of Autoware, otherwise for a new user I do not see an easy way to decide what nodes are needed.
Just a few clarifications from me, which I’ll do inline, but in general I’d say that splitting things up in repositories can be helpful, especially for consumers of the project (assuming it’s a framework or SDK, rather than a stand alone application). However, it definitely comes at a cost, and I would buy the argument that it may not be the best use of resources depending on your schedule. On the other hand, that argument can be used to justify an indefinite amount of technical debt, so I always hesitate to follow it blindly.
So on balance, I don’t have a recommendation for your group, just comments
My impression for why this was the case is that reviewing code for the two cases is wildly different and so you need to look at each pull/merge request to first determine how you should review the code (to what standard) and you have to make sure during iteration that you don’t “leak” into more critical parts of the code. With separate repositories it is clear, if you need to change the safety-critical code it will require a separate pull request and that is easily noticed and audited.
Perhaps I miss the point, but as @gbiggs mentioned, I don’t think this has any impact on code freeze versus feature freeze.
I think you’re mistaken here. GitHub has the same features as Gitlab w.r.t. reporting the status of CI/CD. We do not use this feature because we have custom infrastructure and have not taken the time to use the GitHub API to automatically report the build status. We do have this in ROS 1 and ROS 2, but we don’t use it as much in ROS 2 yet, e.g. here’s a pr picked at random:
As others pointed out, I think it’s actually easier to be granular with permissions (a good thing imo) if you use multiple repositories within an organization.
This is frustrating for me because I have explained this so many times
It is not a limitation of bloom, but instead a limitation in our process or maybe a limitation of distributed VCS (i.e. git) depending on how you look at it. Basically it boils down to the requirement that each release has a tag, which I think is a reasonable and good thing to have for people consuming your software, and is reinforced by things like GitHub tying your tags to releases (not every tag is a release but every release is a tag).
If you keep that requirement then you cannot (in my opinion) realistically tag the repository in such a way that more than one version of a package can be represented easily. For example, if your repository has foo and bar packages in it, and on your first release you release both at version 1.0.0 and so you can use the tag 1.0.0 for the release. But then you want to release foo and not bar, so you set the version of foo to 1.1.0 and keep bar at 1.0.0, so but then what tag do you use? foo-1.1.0?
What if later I update foo and bar to 2.0.0, do I then use 2.0.0 again or use two tags, one for each? If the user wants to get the latest version of the software that works together, which tag do they choose? Remember that bar-1.1.0 could be newer than foo-2.0.0. Also bar-1.1.0 might not have a released version of foo as its peer, but instead some in between versions state.
I could go on for a while, but the point is that it’s a conceptual issue, not a limitation of bloom.
You can already do this using the ignore files. It would be trivial to have this as a command line option to bloom.
However, the conceptual issue still stands, if you find a small bug after a big release that you fix upstream then you still need to do a new release and then you’re back to the tagging issue I mentioned above. Even if you tag new releases for all packages and only want to actually bloom one package’s new version, then you could do that but you’d have a mismatch between binaries (debian packages) and what’s in the source tree, which is confusing for contributors and consumers of the software.
The right answer, if that’s your concern, is to split the repository up. Otherwise, I think you need to live with the “useless version increments” in unchanged files and the slow release process and the redundant updating of basically unchanged binaries that you mentioned.
@gbiggs (CC: @aohsato) Should we rename core_control to core_actuation? The figure in Overview has sensing, perception, decision, planning, and actuation. It does not have control. I’m thinking which repository should hold the actuation layer discussed in #1677.
Although “actuation” is an often-used term in robotics, so is “control”. In this case I feel that “control” better describes what is going on, i.e. controlling the car to follow a planned motion, with actual actuation (controlling the wheel and steering actuators) being a subset of that.
Then the first thing we should do is come to a consensus on what constitutes planning and what constitutes control.
In my experience in the manipulation world, once you have a set of joint states over time (a trajectory) to achieve, anything after that is control. Creating that trajectory is planning. My experience in mobile robotics has planning as producing a path to follow to the goal and a shorter path to follow in the immediate vicinity of the robot, and deciding what velocities to drive and turn at to follow that path being control. I think the two are fairly similar. But I don’t know if autonomous driving follows the same convention or not.