ROS pkg quality metrics: analysing ROS wiki pages

This post fits in with the current topic “Making package quality visible in ROS” which is being discussed in the quality working group meetings.

This post is about analysing wiki pages for packages that provide nodes. Other types of packages will most likely need other types of analysis and / or metrics.


It’s doubtful whether we can actually assess the quality of a wiki page for a ROS package (as that would most likely require an understanding of the text), but looking at some of the tools available, I have the impression that we might actually be able to say at least something quantitative about a wiki page.

First: it turns out there are actually (at least) two pages on the wiki that say something about what a good quality wiki page should contain, or at least what a wiki page consistent with other pages should contain:

Most of this is style related: use consistent naming, try to avoid introducing new styling / markup / layouts, etc. Analysing / checking this might be possible with recent deep learning techniques, but I’ll skip that for now. Things that should be (I believe) machine checkable without too much work are:

  1. presence of customary / required sections
  2. ROS API

Sections

This will be up for discussion, but a “good enough” wiki page of a package should probably contain at least the following sections (not necessarily in this order):

  1. Package Header (auto-generated)
  2. Overview/Introduction
  3. Table of Contents
  4. Requirements
  5. Installation instructions (could be auto-generated, ros-infrastructure/roswiki#121)
  6. Report a Bug (could be auto-generated)
  7. Tutorials (if there are none, make that explicit)
  8. ROS API (see below)

Checking for this is not too difficult: MoinMoin provides access to page contents through the Page class. As MoinMoin pages are plain text files stored on disk, it should also be possible to check them directly (with a script running periodically fi).

ROS API

A properly documented package should list the ROS API for all provided nodes. This ROS API consists of:

  • topics (published, subscribed)
  • services (provided, called)
  • actions (provided, called)
  • parameters (read, written), and
  • TF frames (required, provided)

All of these can be documented using a provided ClearSilver template (NodeAPI) and they should be documented per node (see StyleGuide/ROS API).

Similar to checking for sections, presence of all of these should be checkable, either via the Page class or scanning the page source on disk directly.

Some examples of pages that document the ROS API of their nodes (in no particular order):

To make sure that the entirety of a node’s ROS API is documented, it could be extracted using suitable queries against HAROS v3 reversed metamodel of a package and then checked against the NodeAPI template parameters provided on the wiki page.

Current status

None of the above is currently checked by the wiki, the buildfarm or any automated system.

There are some tools that automate generating ROS wiki pages. One example is roswiki_node by @DLu.

2 Likes

Good idea! How can this be implemented? I.e. Where to display the analysis results? How and when to run the query? …

I can think of two possible approaches:

  1. a MoinMoin macro/plugin that gets called each time a page is rendered
  2. a stand-alone script that gets run periodically

The first would seem to incur quite some overhead on the wiki server, which iiuc already suffers.

The second could be part of a task run by the buildfarm. The output could then be placed on the repo or doc machine, a MoinMoin macro/plugin would then only need to parse the result (probably some yaml file), which would be much lighter.

Decoupling the analysis from MoinMoin would probably be a good idea anyway, as it would increase the reusability of this for ROS 2 documentation. You could say that doesn’t really matter though, as the approach described above is tied to MoinMoin anyway (ClearSilver template, looking for sections in pages, etc).

A disadvantage of a task on the buildfarm would be increased load there, and the possibility of stale information being used by whatever renders the results (as the jobs would be periodic).

1 Like

And just for future reference, a potentially relevant paper on assessing quality of comments in source code:

Steidl, Daniela & Hummel, Benjamin & Jürgens, Elmar. (2013). Quality analysis of source code comments. IEEE International Conference on Program Comprehension. 83-92. 10.1109/ICPC.2013.6613836

I would say that I feel doing something like this is out-of-scope for this effort.

I prefer the MoinMoin option (my subjective opinion) as it is decoupled from the Buildfarm. My rational is, the analysis (output) doesn’t fit in the scope of the Builfarm.

Well … the buildfarm already runs jobs that generate the status pages (which show the status of packages across the various ROS releases), documentation jobs (that generate the yaml manifests used by the PackageHeader wiki macro) and a few other jobs that don’t directly build software.

This analysis could be part of a job that analyses more quality aspects of registered packages, or it could be a stand-alone job.

Running any sort of analysis on the wiki server at render time is definitely a non-starter. That content is viewed way more often than it changes.

Somewhat as a corollary the wiki is designed to be easily and openly edited without significant structure. Many of the things that are already referenced are auto generated. It might be worth exploring auto generating more of the content, rather than finding way to enforce that the content is captured in the free form wiki content.

One of the ideas is that prescribing structure (ie: freedom from choice) would lead to improved readability and usability of the wiki (this is already done with the Package Links box and the Package Header). As you’re probably aware, structure helps as it allows visitors to form habits (ie: quickly assess state of something as it’s easy to compare to others).

An added benefit of prescribing structure is that it makes analysis easier/possible. And with analysis comes the possibility for gamification, which is something that appears to work well for/in other projects/contexts.

Auto-generation would certainly help, but as I wrote above, enforcing some minimal content might be equally as effective.

Indeed I completely agree that prescribing structure is highly valuable.

My point is that all the things above that you call out as being valuable steps in that direction are actually already outside the wiki content.
I misspoke that “the wiki” when I should have said “a wiki” or “wikis” designed to be edited without significant structure. To cite wikipedia on what a wiki is:

“A wiki engine is a type of content management system, but it differs from most other such systems, including blog software, in that the content is created without any defined owner or leader, and wikis have little implicit structure, allowing structure to emerge according to the needs of the users.” Wiki - Wikipedia

And related to this the wiki content is not particularly machine readable and is definitely not stored in an accessible way. To “enforce” content in the wiki is both antithetical to the wiki design and also technically very difficult with the requirement that you will need to start doing things like parsing the wiki markup. And what do you do if it “fails” your enforcement check?

I think it would make more sense to extend the package header to include or enforce anything you want rather based on machine readable content in the code. This can range from a potential custom yaml definition of the node api, to doxygen comments (which are used by the documentation job, rendered in a standard way for users to view), package.xml (which is turned into the package header on the wiki), or anything else.

There’s also significant issues with the wiki and versioning and getting out of sync with the content of the code. It is often better suited to having the structured content in the source not on the wiki.

We can add and extend the ROS documentation website by adding more structured content. We generally call that the ROS wiki, but actually the majority of the content is already outside of the wiki engine. And to continue scaling we will want to continue to encourage and find more ways to use structured data outside the wiki engine as it’s specifically not designed for structured data.