Splitting the Autoware.AI repository and changing the organisation

gbiggs · March 5, 2019, 7:59am

The current approach of using a single repository for all of Autoware has become increasingly
untenable as the size of the code has grown. There are several problems with maintaining this
approach.

The source code is buried several layers down a deep directory hierarchy.
New contributors are confronted with a huge amount of code, which discourages them from diving
in to make a bug fix. Not knowing where to put a package or algorithm they want to contribute is
also a symptom.
Doing CI for a single change in a single file, no matter how small, requires that the CI pipeline
build all of Autoware. This currently takes anywhere from an hour and a half up.
It is essentially impossible to version different parts of Autoware independently.
It is difficult to guarantee a separation between safety-critical and non-safety-critical code.

To fix this, one of the major items on the roadmap for 1.12 is to split the repositories and
reorganise the package layout. This will be done as early in the 1.12 iteration as possible, so I’d like to
start discussing it now and come to an agreement on what repositories we will create and what they
will contain.

Based on discussions with @esteve regarding how to layout the repositories for Autoware.Auto, here
is a starting-point proposal for the new repositories.

autoware
Root repository. Contains a README file, the .repos file for checking out/installing Autoware
using vcs (which will be the preferred method from 1.12), and nothing else.
core_perception
Core packages related to understanding the world around the car.
core_planning
Core packages related to planning where the vehicle should go.
core_control
Core packages related to controlling the vehicle so that it goes where it should.
drivers
Hardware drivers for interacting with hardware, such as the Velodyne driver. Ideally, all
packages in here will be candidates for eventually being pushed upstream somewhere.
utilities
Packages that we develop that are not core to Autoware, i.e. you do not need them to drive a
vehicle.
visualisation
Packages dealing with visualising sensor data, the state of the car, etc.

In addition to the above, we may create additional library-specific repositories for libraries we
create that do not depend on ROS or any functionality from a ROS package. A good example would be
an algorithm for a point cloud processing function. Thanks to the magic of vcs and colcon, such
non-ROS repositories are relatively easy to integrate into a workspace.

Each repository will follow a fairly flat directory structure. Although it is not a rule, as a
guideline I think each repository should be a collection of packages at the same directory level.

This work will be done fairly early in the work for 1.12, because we don’t want to do it while we
have lots of pull requests in flight. I will start another thread soon to discuss the timeline for
the first couple of weeks of the 1.12 work.

Related to splitting the repositories, we are trying to get our hands on the “Autoware”
organisation name. It has been registered by someone else (Shinpei claims it is not him) since
2017, but has not been used since creation. We think that we have a case for getting the name based
on GitHub’s strict anti-name-squatting policy and the activity level in our project. If we are able
to get the name, then we will perform the following actions.

Rename the CPFL organisation to Autoware.
Create a new CPFL organisation.
Move all non-Autoware related repositories to the new CPFL organisation.

We will wait a week or two between the first two steps to give people time to update the URLs in
their local checkouts.

If you are wondering why we are renaming organisations rather than creating a new one for Autoware,
there are two reasons.

It won’t mess with people’s subscriptions (we think).
Our fearless leader likes his stars.

Also related to splitting the repositories is reorganising the package structure. I will start another
thread to discuss that, including how much we need to do before splitting the repositories and how
much can be done after. Please save discussion on that topic for that thread.

gavanderhoorn · March 5, 2019, 8:12am

MoveIt underwent a similar transition/migration, but in the opposite direction (ie: merge everything into a single repository, coming from multiple).

Perhaps @davetcoleman can provide some input on what the experience has been since then?

gbiggs · March 5, 2019, 10:52am

Yes, there are benefits to having everything in one repository.

@esteve pointed out to me the difficulties ROS 2 has had with the complexity of CI pipelines when a PR needs specific branches or commits from more than one repository. We’re hoping that having a relatively small number of repositories compared with ROS 2 will reduce that difficulty for us. We could have broken down things much further to smaller segments of functionality to satisfy goals like separating safety-critical and non-safety-critical core completely, but we don’t want too many repositories that will be too tightly integrated.

Installation or checking out a project from multiple repositories is also difficult. Here, the vcs tool is a life saver. I wouldn’t attempt multiple repositories without it.

Interdependencies between repositories can become problematic. Again, I think the small number of repositories will help. In addition to that, the proposed split is along natural breaks in the processing pipeline for core packages but not further than that, and the others are things that can naturally exist separately from core functionality. We will be relying on black box interfaces as much as is practical.

sgermanserrano · March 6, 2019, 2:29pm

@gbiggs is it planned to have independent releases for each repository or will there be a common release for all of them even if no changes have been made from the previous release?

gbiggs · March 7, 2019, 12:07am

Each repository will be independently versioned. If a particular repository wants to put out a patch release, for example, it can do so. If the same version of a repository is used for 1.12.1 and 1.12.2, that is also possible.

The magic of vcs means that we can specify that “Autoware 1.12” corresponds to:

core_perception 1.60.1
core_planning 1.12
core_control 2.3
drivers @commit 8B65D57D…

etc.

Effectively the version of the autoware repository will define the version of Autoware in use, but we get the flexibility to manage API changes and patches and so on for each component independently. I hope we won’t divert too much between the repositories, but this flexibility is useful.

It’s also easy to check out a workspace that has the master version of all repositories.

esteve · March 7, 2019, 8:57am

I’d be a bit more restrictive regarding versioning. All the core packages must have the same MAJOR.MINOR version to indicate that they are only compatible within each other, but PATCH releases can evolve independently. Basically follow Semvers:

I’d add a simulation repository that may contain any simulator-specific code, scenery and models. However, code in this repository is also a candidate to be pushed upstream to the respective simulator (e.g. LGSVL, Carla, etc.)

gbiggs · March 7, 2019, 9:13am

Yes, I agree. I wanted to show the flexibility we (and users) would have to create very custom combinations of versions just by producing a .repos file, e.g. to test a particular feature.

sgermanserrano · March 7, 2019, 10:36am

@gbiggs @esteve in order to future-proof the new repository structure we need to think of potential use-cases for Autoware. I can see 2 clear options for use:

All bundled in the same machine: this would effectively be the same approach as the current repo has, i.e. AV-related nodes, handling of startup and visualisation (Rviz) are done in the same machine.
Distributed: this option would run AV-related nodes in one machine, whereas handling of startup and visualisation is performed in a separate device(s). This second alternative would be closer to what it would be expected for an AV, where un-needed overhead is not put on the embedded device which is performing sensor processing, control, etc

To achieve the above the msgs might be needed everywhere (I haven’t looked in depth as to whether they’d be needed on Rviz yet), so it might be beneficial for them to have their own repository.

We also need to consider how and what docker images will be generated and where the Dockerfiles will be hosted.

gbiggs · March 7, 2019, 11:45am

I can see the argument for that. However I can see the argument for having message packages where they are most used or produced, as well. Another thing to consider is having one single package that is depended on by everything else. This can potentially get annoying when that single package changes. (I can’t remember and ROS2 doesn’t compile on my iPad so I can’t check, but I recall that all messages in a package get compiled into a single library.) So even if they all go in a separate repository then we still may want to put them in separate packages.

Also, remember that having things in a repository does not change how much binary you put on a target. That’s determined by what you actually compile and use.

If rviz is going to display the message, it needs access to the message.

gbiggs · March 7, 2019, 11:46am

I expect that the same docker images would generally be created. It’s just how they are will need to be updated to match the new repository layout and use of vcs.

sgermanserrano · March 7, 2019, 1:20pm

gbiggs:

To achieve the above the msgs might be needed everywhere (I haven’t looked in depth as to whether they’d be needed on Rviz yet), so it might be beneficial for them to have their own repository.

I can see the argument for that. However I can see the argument for having message packages where they are most used or produced, as well. Another thing to consider is having one single package that is depended on by everything else. This can potentially get annoying when that single package changes. (I can’t remember and ROS2 doesn’t compile on my iPad so I can’t check, but I recall that all messages in a package get compiled into a single library.) So even if they all go in a separate repository then we still may want to put them in separate packages.

Also, remember that having things in a repository does not change how much binary you put on a target. That’s determined by what you actually compile and use.

Agreed with the above, what I tried to highlight is that we would need a way to compile/install just the messages in the visualising machine as opposed to having to build the whole Autoware stack for the sole purpose of handling/visualising the embedded device where the actual nodes would be running.

I meant that I haven’t looked into whether the current rviz setup actually needs any of the custom messages or if it just using standard ROS messages for the visualisation.

amc-nu · March 7, 2019, 3:22pm

@sgermanserrano Visualization (message wise) so far only depends on visualization_msgs/Marker. As for plugins: jsk-rviz-plugins. Specifically TextOverlay and Plotter2D.

You might find some old nodes that still use jsk messages. But those haven’t been updated for a while.

Ian_Colwell · March 7, 2019, 4:19pm

Hi All, I’m new to autoware, but just wanted to say that I totally agree with this proposed organization of software.

We did something very similar for a project I previously worked on. We had it split up a bit less but had the same motivations for the split.

Regarding visualization, I’m wondering what is better:

Having a separate viz repo (currently suggested)
Having a single viz package in each core repo responsible for visualizing the custom messages from that core. So you’d have the perception_viz, planning_viz, control_viz ROS packages that are updated along with any changes to the message formats. Here I assume Autoware’s custom messages will be spread out into different custom message packages for each of the 3 main categories/repos.

Anyway, just a thought! I’m definitely happy that visualization is being separated out from the ros packages responsible for autonomy, just thinking it might be good to keep visualization code in lockstep with the messages the code is visualizing.

Regarding vcs:
I don’t know anything about vcs (first time hearing about it here), but why are we choosing vcs over git submodules?

Glad to see this effort of organization! definitely will improve the project immensely.

gbiggs · March 7, 2019, 11:05pm

OK, I see what you’re after now.

It will be possible to build just the messages, because they’ll be in a separate package. So the question becomes, are we happy checking out all the core repositories just to get the messages?

I think this suggestion is worth considering. If we put messages in the core repositories, then having visualisation packages in there as well does make sense in some ways. On the other hand, having a separate visualisation repository makes it clear that visualisation is of the messages, not the functionality. This might make it easier to work with 3rd party visualisation projects such as Uber’s xviz in the future.

Similarly having messages in a separate repository makes the separation between interface and implementation clear.

I personally am leaning in the direction of having a separate repository for messages, but I can see the argument against adding another repository.

davetcoleman · March 8, 2019, 4:36am

Sure, I’ll give my two cents!

I’ve loved the merging of MoveIt code into few repos, and I’d really like to merge more. I think a great example of a single-repo ROS project is the new navigation2 project - its way more consolidated than MoveIt is.

The source code is buried several layers down a deep directory hierarchy.

By splitting repositories, you’re going to reduce the directory hierarchy by likely just one level. Is this really worth it? Software projects get complex, but its not the repo layout that makes it easier.

New contributors are confronted with a huge amount of code, which discourages them from diving
in to make a bug fix. Not knowing where to put a package or algorithm they want to contribute is
also a symptom.

I really don’t see how separating the code across the internet (different github repos) makes finding code easier for new contributors. I’d argue its the opposite.

Doing CI for a single change in a single file, no matter how small, requires that the CI pipeline
build all of Autoware. This currently takes anywhere from an hour and a half up.

Even with split up repositories you probably should test it against the other repos every time also, to ensure the whole system builds. Otherwise you’re going to have to test it against the debians last time they were synced, which is every ~3 months? This means you can’t change the API of repos with each other because one of the other repos will always be out of sync.

There are lots of other clever ways to have CI only test relevant parts of the system, but this requires more coding. One very simple improvement is to skip all of CI everytime the change list is only documentation (for example .md files)

It is essentially impossible to version different parts of Autoware independently.

This is a limitation of the bloom-release tools, not of single git repos. If you manually released debians the way most other Ubuntu packages do, this limitation wouldn’t exist.

Also, do you really need different versions of software within the Autoware project? Isn’t it one large software project?

It is difficult to guarantee a separation between safety-critical and non-safety-critical code.

I don’t see how this difficulty really goes away with separate repos. Someone could still put unsafe code in the wrong places in either scenarios. Its always up to your PR reviewers to enforce this.

Side note: I have used some pretty cool Bazel rules to restrict access to certain parts of a code base from other parts.

gbiggs · March 8, 2019, 6:35am

All very good points, @davetcoleman! Thanks for the input.

I also like the navigation2 repository’s layout. I think it’s very clean and I prefer the same packages-at-the-top-level approach. However, one of problems we have is the sheer number of packages. While that number may go down (I hope), we are probably still going to have a lot. navigation2 has 20 packages. We have 131.

Well that’s why we’re here. We have this proposal, we need to decide if it is worth doing.

This is true, although if they know they are looking for a perception algorithm then it is fairly obvious which repository to go for. However the same could be said for top-level directory divisions.

I’m hoping to get nightly debians going with our own little package repository eventually which will allow us to test against recent versions of other repositories using binaries. Same for tags, so testing against the most recent release of each repository when the release is made rather than needing to wait for a sync of the OSRF ones. And relatedly, in general I currently think that the master of every repository should build against the master of the others. PRs that require changes in another repository should be coordinated with those other PRs to be merged as simultaneously as possible. The CI would get messy and manual in these cases but in theory the split we have chosen breaks things along black-box lines so such multi-repository changes should be rare. I would welcome evidence against this and do not consider it a watertight case.

Yes, another true point. But with 131 packages I really want to automate releases.

Not really. A large part of the project is algorithms and users do tend to pick and choose. We also have commercial entities wanting to build custom combinations of bits of Autoware and they may want the latest perception algorithms but stick with a set of control packages that they know work.

I was more referring to being able to say “this whole repository is safety-critical code”, as an example. But it’s not really relevant as the proposed split doesn’t really split along these lines anyway.

I’d love to see those!

gbiggs · March 8, 2019, 8:22am

Another reason that @esteve reminded me of is that we are starting a re-implementation project to fix all the problems with .AI, and we want to integrate between the two. So for example reusing visualisation becomes a lot easier without worrying about package name clashes with other parts of Autoware between the two versions if we can reuse that repository as-is. Long term there are parts that we want to recycle wholesale (e.g. visualisation again), and so not having to switch to a different repository for that is also a benefit. Long-term, this is more relevant to the peripheral parts such as visualisation and simulation than the core parts, but in the short term the core parts are also relevant.

amc-nu · March 8, 2019, 9:32am

As a user, and contributor to the Autoware project. I am completely in favor of splitting the repository.
Having well formed, independent modules will allow users/developers to take only the required part for a certain project/application.

Autoware.AI was born as a research project. Thanks to its flexibility, and ROS compatibility, It is still widely used in other robotics applications.

Autoware also has been, fortunately, growing very quickly. However, due to its current size, number of dependencies, and interrelationships among modules. It is not easy to just take a part, plug it in somewhere else, and use it without having to compile the whole project. Self containing the modules, and minimizing the dependencies will ease its use, and interaction with other platforms.

As a developer, when creating a new feature for Autoware, or writing a patch. Having to wait for all the non related packages to complete, is in many instances desesperating .

esteve · March 8, 2019, 12:32pm

I think having a separate repository for messages would definitely help here. Having the interface (messages) separate from the behavior (nodes) tends to be a good pattern for loosely coupled architectures.

amc-nu · March 8, 2019, 1:18pm

I also agree with messages separated. Map and configuration related messages are used everywhere.
Some other messages haven’t changed for a while. We could keep them in a repo, release them, and just add them as dependencies.

Topic		Replies	Views
Source code layout revision Autoware	7	1416	April 16, 2019
Autoware move to GitLab and repository split Autoware	3	1720	July 11, 2019
Autoware.AI repositories have moved to GitHub Autoware	0	1201	June 18, 2020
Contributor guidelines for Autoware.AI Autoware	2	1013	March 12, 2019
1.12 development timeline Autoware	14	2404	March 29, 2019

Splitting the Autoware.AI repository and changing the organisation

Related topics