Possible extended ros index features (feedback requested)

I have some experimental features for rosindex that I would like some feedback on. These are either controversial, or require additional work at the ros-infrastructure level beyond what is currently done. They are designed to make it easier to locate and evaluate potential ROS packages.

The extended version of rosindex is available at https://index.rosdabbler.com. This is hosted on github pages, so should be fairly usable, and I expect to maintain it for at least the near future. But if these features are not incorporated into the main rosindex package, they will probably fade away over time.

Here are the extended features:

  1. Package interest and usage by Github repo stars, and by package download count.

I include a column which shows the Github stars for the repo:

and also the download count from the official ROS package repository:

Note ‘angles’ is included in an official ROS metapackage, so it relatively high download count might not reflect actual usage. But its decent github stars shows there is probably significant usage. ackerman_msgs appears quite popular as well, and it does not benefit from metapackage downloads.

  1. Repository descriptions from Github:

Some sort of repository description should probably have been included at the rosdistro level, but that ship has sailed. Most repos though include a useful description at the Github level.

  1. Search Github for unreleased ROS packages:

Github is searched for “package.xml” files with the appropriate <?xml-model value, and these repos are indexed if the repo stars are >15. Example: I’ve been interested in audio lately, but if I search jazzy for audio, nothing comes up. There is a lot more on this github search (which is sorted by github stars). Looking over this list of packages, some seem to be abandoned, but a couple of active projects standout: opendr_perception and gst_bridge. Personally I find this quite useful.

There seems to be one school of thought that believes that inclusion in official ROS documentation like rosindex should be opt-in. rosindex was originally designed to be a comprehensive list of ROS packages, whether released or not - but still opt-in. It’s fair to say that opt-in inclusion of unreleased packages in rosindex has not worked. Fewer than 10 repos do this, while there are >20,000 ROS repos on Github.

We’ve been moving rosindex lately to be primarily focused on released packages, so this inclusion of a search of Github is rather controversial. The problem though is that there are many widely used ROS packages that do not seem to be officially released. Should we make it easier to find these?

5 Likes

Great idea. Whoever has put a package on github publicly, has to count with the possibility that someone will find it :slight_smile:

I would suggest shortening the descriptions (perhaps multi-line support) such that these columns can have a better description. The mouse over text is helpful. but just glancing at the page it is unclear to me what these columns mean.

I think a direct link to the project readme would also be fairly helpful.

With respect to including arbitrary Github packages I think this is a double edged sword. On one hand I think it would be great to include “high value” Github packages that are not available as binaries. On the other hand, there are a lot of throw-away repositories / forks out there that would increase the overall noise in search. I feel like we would want to set some sort of arbitrary number of star threshold to be included in the search. Another useful metric might be last commit date. When I browse repositories on Github I often use the most recent commit date to determine if the package is being actively maintained.

One other thing to consider is REP-2004 (package quality declaration). While uptake is minimal I think we would want to sort packages by their overall code quality. Including REP-2004 as a metric might increase uptake.

1 Like

Some great things here. I made a few notes while checking it out. I hope you find them useful.

  • To me the downloads icon doesn’t seem clear. I understand it’s trying to match the style of “pkg_deps” and “dependants”. It still might benefit from adding the bar at the bottom of the arrow.
  • I like the idea of scrapping Github for unpublished packages, but it looks like it’s picking up the forks of existing projects as well, which leads to a lot of repeat packages. And I also wonder why not scrape Gitlab or other public repositories that we use.
  • Originally I thought I couldn’t get just released packages. Discovered the filter is released:released. Adding that to the examples in help would make it more discoverable. But it could be more intuitive , something like status:released and released:true.

Also I like Kat’s idea on showing package quality level, though I’m not sure where you’d be able to get that data. It’s probably not a metric that can be trusted to be accurately self-reported.

The current implementation limits Github repos to 15 stars or more. So I am only including about 1/3rd of the available repos. Practically, the limit on this is the size of the json files required to power the display and search table.

Unfortunately there is a complex bug in the open-source table package we are using that prevents multi-line support from working in our case. Also, including textual descriptions of all of the search columns would take up a lot of space, which you practically could only accommodate by reducing the number of displayed columns. Yes the column icons are hard to understand, but I don’t see a practical alternative.

We could rely more on search via textual input of the search field limits, but I really think the vast majority of users are going to rely on a simple text search field (like “audio” or “slam”) and then rely on the quick column sorting to locate what they are interested in (sort by github stars, or package downloads for example). The very fast sort and scrolling makes this approach feasible.

What am I likely to concretely actually try to implement in the official index.ros.org?

  • github HTML scraping. With this I can get repo description, github stars, and repo tags.
  • package download counts.

The inclusion of unreleased Github packages is much more problematic. That requires using the github API, which is severely limited in allowed queries. The code to do this requires hours to run while waiting for query rate limits, and pretty much needs to be babysat. I think I will do that manually for awhile into index.rosdabbler.com, but not attempt to implement that automatically in the ros infrastructure. I might be willing to fight those battles if there was a lot interest in this, but so far I am not seeing enough interest.