Package discovery - experimental tools in rosindex, and request for experiences

I’ve been working on upgrades to rosindex recently, which I view as the primary tool in the ROS ecosystem for package discovery. I’d like to solicit comments here on some proposed features to help package discovery, as well as get any general comments on what would help package discovery in ROS.

My development rosindex build, which is many commits ahead of the official version, has in addition to major changes in package search, two new fields aimed at package discovery: relative download counts for each package, and whether the package is “core” (which I define as required by desktop_full). You can see the package search page here. You can sort packages by download count by clicking on the down arrow ↓. Whether a package is core or not is shown by the target ⊙. You can exclude core packages by entering core:false in the search bar.

Any comments on whether this is useful, or suggestions on how it might be improved? I included the “core” column and filter because I tend to find that download counts are normally dominated by core packages, which I don’t need to see if I am trying to find an interesting undiscovered package.

I’d also like to hear any general comments on package discovery, such as existing useful tools for this, or what would make it easier.

9 Likes

Wow, this looks really interesting. I like the idea of core/non-core packages.

1 Like

A good list of packages that are frequently used in some way is REP 2005. It’s overdue for being updated, but this (and the updated version, when it comes) are a good reference point. We could add a machine-readable version of this when we revise it to make use cases such as yours easier.

2 Likes

Wow, this looks fantastic! ROS Index is well overdue for a facelift. I think the core / non-core distinction is helpful. It might also be helpful to filter out packages that are exclusively message definitions.

I might push back a bit and ask you if we were to extend package.xml, what would you like to see? I’ve been saying for quite some time that we need a mechanisms for tagging package types (e.g. control, planning, hardware drivers, hardware type, etc).

1 Like

Thanks for pointing out REP 2005, that is not something I was aware of.

If you ever have a serious discussion about the role and possible updates of REP 2005 I hope you include me, I would have a lot of opinions about it as someone who has tried to deal with the ROS universe of code in rosdoc2 and rosindex.

Briefly though, I don’t think such a list is a useful tool for package discovery, but it is still good to know that it exists.

One place it would be useful though is as a reference in understanding the health of the ROS project. I’ve been investigating that lately, using cauldron.io and gremoire labs. You can see for example one attempt at looking at activity in ROS core here. I used my own definition of “core” but it would be useful to have an official definition.

I was wondering if you would object to tracking package download counts. IIRC you had some cautions when this was discussed earlier, but perhaps I was mistaken.

Tags

As to tags, first the current status. rosindex already supports tags via rosdoc2.yaml, but this is not really promoted, and IIRC only 4 packages have tags defined. rosindex fakes tags by splitting package names by “_”, so that “control_msgs” is assigned tags “control” and “msgs”. This would be good to keep in mind in naming packages. That is actually quite useful in package searches. If we ever write a guide to documenting packages, it would be useful to mention these capabilities.

I’ve considered myself undertaking a project to attempt to curate ROS packages using tags. (I don’t believe you could get 2000 packages to be tagged by their authors at this late date in the development of ROS.) But if you try using the new search features of rosindex, I’m not really sure that is valuable. For example, if you simply search ROS Humble for “slam” then sort by download count, you pretty quickly can identify the active slam packages. Tagging a package “slam” is not really needed when search is more effective.

Package.xml issues - Project.xml anyone?

You asked what would be useful in package.xml. I could have some minor suggestions, but to me the glaring problem from the perspective of discovering ROS capability is that we are focused on packages, but there is not really any defined way to understand the parent of packages, which I’ll call “Projects”. In rosindex we use repositories as a stand in for that (and I’ve added “org” in package search as well which is the parent of repos), but these are heuristic indications of the existence of what are really fairly well-defined projects. Some projects use multiple repos for their components which defeats this heuristic. Some projects are well understood as important to ROS (ament, colcon, various DDS vendors, navigation2, moveit2, rosbag2, rviz, ros2_control for example), but f you are trying to locate capability that is outside the well-known projects, in many cases you are looking for a project, not a package. The way ROS is defined, you can only find packages, and then identify projects. The many packages of the well-known projects make it more difficult to find a lesser known project.

In package documentation (rosdoc2), many project packages don’t even bother with a README because there is a well-defined set of documentation at the project level. rosdoc2 and rosindex struggles to find that. Most of the larger projects have their important documentation at the project level, not at the package level.

2 Likes

I agree. It’s a formalised list of “this is core”, and I think that a machine-readable version would be useful input to ROS Index for a search filter.

ROS 1 used the name “stack” for groups of packages that are intended to work together.

http://wiki.ros.org/action/show/rosbuild/Stacks

Do you know why this was not included in ROS 2? Was that a deliberate decision, or just something nobody bothered to move forward with?

Stacks were removed in Groovy Galapagos:

https://wiki.ros.org/groovy#Removal_of_Stacks

We added the concept of a metapackage in REP-127 at the time to cover the use case.

When moving forward with ROS 2 we chose not to migrate forward the metapackage as a distinct type of package with enforced capabilities. But kept the flag in the package.xml metadata in REP 149 #metapackage section.

Some logic to find reverse depends with the <metapackage> tag might make sense and expose that differently than just a reverse package dependency.

Re tags and other metadata

For tags the original rosindex author @jbohren / @jbohren-hbr put some thought into some proposals which are embedded here: ROS Index Metadata however we never had the time to fill this in. I think that going in this direction and creating a formalism on taging conventions inside of the <export> tag would be great. With that we can improve both rosdoc2 and rosindex to leverage those for appropriate cross linking and discovery.

For example the Tutorials tab is targeted to use those tags. That we’ve long wanted to do: formalize how to capture tutorials in package.xml exports section · Issue #104 · ros-infrastructure/rosindex · GitHub Though being empty right now we should probably disable it: Comment out "Source Tutorials" since they're not working yet · Issue #93 · ros-infrastructure/rosindex · GitHub

1 Like

Searching in humble, only 6 packages set <metapackage/>. None of the core metapackages (like desktop) are using this. There are many more than 6 defacto or claimed metapackages.

I still believe that we should detect and document metapackages using the functional definition from REP 149 #metapackage “Because metapackages only supply execution-time dependencies, they use <exec_depend> to list the packages in their group” but there were objections to that because it is a heuristic and not a definitive tag - but the definitive <metapackage/> tag is not widely implemented.

But back to the issue of the missing parent-of-package type, you could use a metapackage for that, but really it does not fit exactly, plus it is a package so cannot be typically set at the parent level in a repo with packages under it, which is the most common organization of the parent-of-package type (is it called project? framework? stack? I’m leaning now toward “framework”).

1 Like

We also have the concept of variants in ROS: Using variants — ROS 2 Documentation: Rolling documentation

That we track here:

Thinking back I think that we actually purged the <metapackage/> tags because of how strictly they’re written in REP 149 and that they actually previously triggered a different build processes which we didn’t want to support in ROS 2.

It is a moderately common pattern of having a metapackage as a peer of other packages in a single repository, which collects the implementation packages into a single bundle. But this actually can become an issue with circular dependencies if the metapackage wants to be more generic and more about a use case versus the code released and bring in an external dependency. If two closely related repositories want to bring in one or more dependencies from each other. (Say metapackage A1 depends on all of A and B2, and Metapackage B1 depends on all of B and A2) This becomes unreleasable, because repository A depends on repository B, and likewise repository B depends on repository A. Similarly the groupings are usually coupled to a use case or application whereas the core packages are often stable. We don’t want to couple releases of core packages with metapackages which usually have no content and thus don’t need frequent updates, but if they do update shouldn’t retrigger large rebuilds because they force the rerelease of the constituet packages with empty changelogs. This is why all the core ones are pulled out into a separate repository, that only gets updates when the groupings change. For these core packages it definitely makes sense to have standardized entry points. However part of the calculus is that if it’s just a list of packages how much custom tooling and infrastructure do we want to create and maintain to support to provide the marginal benefit over the simple list.

1 Like

Here you can find an unfinished proposal including a discussion on the missing level of the hierarchy, which is called to be a collection:

The proposal had other elements for build and dependency management, but essentially a package collection is developed by a single entity and has a unique URI in the form of <hostname>/<organization>/<collection> such as github.com/autowarefoundation/autoware.universe.

These aspects are inspired by Go Packages and Ansible Collections/Galaxy, which make things easier for package discovery and way more modern than existing ROS tooling. Nix packages have nice ideas, too.

1 Like

I can see that there was a robust discussion around your collections proposal, mostly focused on build tooling. I’m not there, I mostly view the ROS world through the lens of documentation and package discovery.

At a 10,000 foot level, ROS is really two things: a set of frameworks that do useful stuff, and a core middleware for building and using those frameworks. Those frameworks are large things like ros2_control and moveit, and smaller things like ros4hri.

A few frameworks are well known. How is someone supposed to discover the less well-known frameworks, like ros4hri? Really the best I have to offer so far is to search the dev version of rosindex for “msgs” which will show only message packages. That reduces the package count to 265, and you can discover some interesting, less-known frameworks that way. I just wish there was some way I could list, say, 150 frameworks with overview descriptions and links to their repos and other relevant locations.

How could I have discovered these repositories below, for example?

Perhaps rosdistro shouldn’t be assumed to be the only way to distribute/discover ROS packages as it is pretty cumbersome. What rosindex needs to cover non-rosdistro packages? I think this is a better question.

You can ask for a repo to be indexed without formally releasing the package. rosindex even has ui for this, Add Repository

But as with most formal opt-in options, the adoption of this is very low. I scanned for packages that just had doc entries in rosdistro a few months back, and found very few (IIRC fewer than 10).

I have not had any interactions with the original author of rosindex, but I get the impression that his hope was that it would widely include packages that were not part of the official releases. For that to happen though, IMHO it would happen because an individual (or group of individuals) took on the task of curating a list of non-official packages that people ought to know about. I’m sure there are probably blog posts or github repos somewhere out there with lists of cool ROS packages, but it takes more than a one-off effort to be sustainable. It would be fairly trivial to include such a curated list into rosindex.

Of course including them into rosindex does not solve the problem I’ve brought up, that rosindex is designed to find packages, but what people really need to find are (stacks/frameworks/projects/collections).

:rkent

Well, somebody needs to curate the data a bit. Or do you suggest a bot should crawl the internet and register anything at least remotely resembling a ROS package?

Submitting a package just for indexing (i.e. no buildfarm builds) is quite easy. A pull request with 5 lines of YAML. But it needs to be tediously repeated for all ROS distros in which the package is assumed to work.

So maybe it would be good to create a “gather all” “release” where people could just register their package once and forever (and I wouldn’t use rolling for that). I can even imagine a colcon extension like “colcon register” that would construct the PR automatically. To even more increase the number of people using this mechanism, colcon could even check (after build? or when?) whether the package is registered, and if it’s not, it could type some hint in the terminal.

Yes, such packages could be super low quality, but at least they would be better discoverable. To limit the amount of packages a bit, a bot could verify the repo URLs and delete the record when it ceases to exist.

A problem with the low-quality packages could be that they even don’t have valid package.xml, containing dependencies on nonexistent packages etc. So ROS index would have a pretty hard time with them. But the recently added rosdistro PR bot could guide the authors through a few improvements steps.

No, I am suggesting that somebody (or some team) could curate a list of interesting packages that could be included in an official version (or unofficial fork) of rosindex.

Looking at the list from @doganulus, nebula, carma, and issac are interesting frameworks that would be good to include. ratslam seems to be an obsolete. That takes curation of some sort to expose, particularly since the package download statistics would not be available for unofficial packages. It would not be hard though to include other indicators of package usage such as Github stars or forks.

I’m sorry, @rkent, you were too fast responding. I was reacting to @doganulus .

Anyways, curated lists created by 3rd parties are a nice thing, but they still miss quite a lot. I think providing more options to 1st parties would increase the coverage the most.

Regarding stacks/collections/groups, could there be a bot that would cluster the package names and contact the maintainers of the cluster with a suggestion for specifying the collection more formally?

That would be easily solvable with ROS 1 stacks. A normal package cannot depend on stacks. Only stacks can depend on other stacks. So if your stack would circularly depend on another stack, just turn the stack dependency into a dependency on the stack’s packages.

They type of the dependency doesn’t matter. If anything in Repository B depends on something in repository A, repository A has to be released first. Likewise, if anything in repository A depends on something in repository B, then repository B must be released first. It’s impossible for both repository A and repository B to be released first, thus a deadlock because of the circular dependency.