What quality metrics do we need to make packages quality visible?

During our last Quality Assurance Working Group meeting, we decided to list and start the discussion around the quality metrics we need to make package quality visible. I listed below the metrics we discussed during the meeting:

  1. CI Badge
  2. Documentation
  3. Open issues report
  4. Result of static analysis run
  5. Type of Tests (i.e. Unit Test, MIL, CI, etc.)
  6. Unit Test coverage
  7. Users reviews
  8. Issues history statistics
  9. Memory leak

As a community, we need to discuss and agree on:

  1. We need to define each of these metrics
  2. What other metrics should we use?
  3. What are the sources of these metrics?

As discussed previously, ROS package page already list several of the following data. It’d be great if we bring it under a certain topic rather than scatter these everywhere. I’ve made a list of items which can be used to define the metrics, suggestions are welcome. This list will be dynamic. I’ll edit (and repost if required) based on future discussions. I’ve also created some groupings. Corrections welcome.

CI

  • Build [Pass/Fail] —> Basic data from the CI tool
  • Unit Tests [Pass/Fail] --> This might require more granularity because some tests are more important. Doing this might be possible for some core packages
  • Unit Test Crash --> Apparently CI tools can detect and report this already, we just need to showcase it
  • Unit Test Coverage [%] --> A diagram showing code test coverage with pass/fail spots like a heatmap (Is there any free options? codecov? CppDepend has a sample of what it could look like)
  • Static Analysis
    • Code Quality (https://wiki.ros.org/code_quality)
    • Number of Coding Standard errors
    • Cyclic includes (for c++ using cppdep)
    • Cyclomatic complexity (possible tool: Lizard). --> There might be an existing tool already. Need to check
  • Dynamic Analysis
    • Clang sanitizers (Address, UB, Memory, Leak, Thread) --> reference, multi-builds required is a cmake hassle
  • Testing
    • Integration tests, maybe model-based testing… --> Discussion in progress
    • Fuzzy testing by “chaos node” —> Being discussed along with contracts in ROS. Maybe use pyros-dev or similar tools??

Documentation

  • Status (Maintained, Orphaned, etc.)
  • README (not all packages do. Repository != package)
  • Wiki (Github/GitLab, etc. if the content isn’t on ROS wiki)
  • Getting Started (Tutorials & Debugging Common Bugs)
  • Sphinx/Doxygen links
  • Link to Tagged Questions from answers.ros (as an FAQ)
  • Other resources like photo/handdrawn/generated (https://www.planttext.com/) UML diagrams, etc
  • User rating/review (maybe for tutorials also, eg: How helpful is this)

For issues, we’ll need to access the host (Github, bugzilla) API regarding

  • Number of open issues
  • Time to close issue
  • Activity on issues
  • Other status (eg: wont-fix, etc.)

Meta information at package level:

  • Efferent coupling: parse package.xml file
  • Afferent coupling: get dependencies from the list of packages on wiki
  • Quality summary data ala HAROS

NB: This isn’t an exhaustive list or even a final list. I’ve compiled it based on past discussions.

Current questions:

  • How to visualize the results? (use CI tool or use raw data and other tools to visualize: depends on CI offering)
  • Integration tests
  • HAROS: What all data to show? Dependency graph, Dependencies of package, packages that depend on this package, etc. There is lots of data, but limited to workspace.
  • Coverage for documentation?
  • Low Priority: Model-in-loop or hardware-in-loop tests
  • File bugs after running HAROS? Makes sense only for categories with 0 false-positives
1 Like

To ensure testability of C++ packages (also if there are no tests yet but shall be added in the future) the following metrics could be added to the static analysis list:

  • Cyclic includes: Can limit (white box) testing of packages. Can be checked with cppdep (static analysis).
  • McCabe complexity (often called differently): Can make (white box) testing of packages practically impossible. Some standards recommend not to raise the upper threshold value of 15. Can be checked e.g. with lizard.
  • Efferent coupling (on various levels of abstraction): Describes the degree of dependency of an entity under analyzation on other entities (e.g. on the class level: on which/how much other classes does the class under analyzation depend on). Further info here. I don’t know if there are free tools which are capable of reporting this metric.
  • Afferent coupling (on various levels of abstraction): Describes the degree of dependency of other entities on the entity under analyzation (e.g. on the class level: which/how much other classes depend on the class under analyzation). Further info here. I don’t know if there are free tools which are capable of reporting this metric.
1 Like

Fuzzy testing (in the most basic form) is not really dynamic analysis because the entities are considered as black box (not white Box).

For ROS2 proceeding of “ros2fuzzing” is kind of blocked right now

Future (in no specific order)
Design / Concept

  • Python-based launch with stable API, introspectable, optional XML frontend

but could be way easier than in ROS1 then…

It would be possible to show efferent coupling on the package level by parsing <run_depend>, etc. in ROS1 package.xml files. For ROS2 packages one could parse <depend>, <exec_depend> in ROS2 package.xml files. From this information one could generate some kind of dependency diagram like HAROS (as far as I know HAROS works on the workspace level only) does… this could be combined with the packages “quality summary data” (HAROS package coloring). By combining data gathered from public packages one could get information about afferent coupling as well…

Should Fuzzy testing be moved to a different block containing black-box fuzzing as a sub-component? (Check updated list)

Could you please correct me if I’m wrong about black-box fuzzy testing:

  • It would be similar to using hypothesis (python package) to create messages and checking output
  • It would have templates for different message-types, not for node-specific requirements. Eg: Range input for float message, 3 different ranges for acceleration, etc.

Would node-specific behavior such as accepting messages in strictly increasing time order be included in black-box testing? This obv uses a small but nonetheless inner knowledge of how a node should work.

IMO, white-box fuzzing might be helpful in checking corner cases like a typical test but more on the lines of model-based testing, ie. developed specially for the package or node, specially for non-infrastructure packages. On the other-hand, black-box fuzzers would be generic and their use would involve minimal (one or a few lines only) involvement by the maintainers.

As for development, I think ROS2 with node-lifetimes would make such testing both simple and a necessity. However, a generic fuzzer on the messages, services (and actions) without lifetimes would be similar for ROS1 and ROS2 unless my assumptions are wrong. :confused:
pyros-dev should make the testing more or less similar, right?

Afferent and efferent coupling at package level is already calculated (by the doc jobs on the buildfarm) and shown on the ROS wiki. They are shown in the Package Links box to the right of the Package Summary section of each package that has a doc job.

See the page for roscpp for example:

sshot

1 Like

For fuzzy testing in the ROS “node” Level hypothesis could be used to generate messages, services and parameters according to “commons” or standards. Usually application specific customization of these “search strategies” is required. (That is the really challenging part of getting reasonable data.) It cannot be used to check output.

What do you mean with "accepting messages in strictly increasing time order "? In the ROS “node” level one could e.g. generate a list of subsequent topic messages which could be thrown at the node then. If you care e.g. more about state-based behavior of a node hypothesis support for state-based testing could be more suitable.

  • I don’t think that you can replace model based testing with fuzzying.
  • You have to customize data generators for most packages anyway to get reasonable data I guess. (You don’t want to evaluate a lot of false positives in a manual manner.)
  • As far as I know all fuzzy testing tools in the source code level consider the classes, functions, etc. as black box. (In best case they provide state-based fuzzying as well which may be of value in case you have e.g. classes with internal state.). However you could combine it with dynamic analysis which would make it in overall white box testing. That seems be done quite often…

Yes. And to clarify, I meant checking the Header part in messages, and dropping all messages whose Header.time is less than previously received. This was a simple message based state that a node can exhibit and I chose this because it appeared as a simpler choice. My bad.

And you’re right about fuzzing and it’s limitations. I was talking about making fuzzing+dynamic analysis easier for maintainers to provide a simpler but robust enough platform for testing compared to a completely model-driven approach.

The problem (in my opinion) with current method is this is a constant depth graph. As such there might be a case that a package A might be rendered useless because of license incompatibilities caused by it’s dependency on package B. Finding out packages at different levels would be useful and the current list method can be modified to show a graph. A graph also helps to choose the package with minimum proliferation of dependencies in case of two or more similar packages. I do concede that the average ROS user would not be affected by this issue as much a company. (Also, I’ve noted above that this list is what should be displayed, not what wiki is lacking. Current info on wiki isn’t lacking per-se, it’s just spread around a bit)

PS: I don’t think I’ve said this before, but please inform me if I appear confrontational. I find it hard to choose the right sentence structures and … sometimes things go downhill

I was a bit vague in my reply: the data used to render those dependencies on the ROS wiki is available from the buildfarm (stored on the doc host, see indigo/api/fanuc_experimental/manifest.yaml for an example). It’s true that the Package Links box shows only a single level, but following those yaml files to an arbitrary depth is trivial and would enable you to compute recursive dependencies as well.

Note that there are several tools that already do this, but then not based on files on the doc host: the various rospack depends* commands can compute recursive afferent and efferent coupling of packages for you (see the documentation).

Don’t worry, I’m Dutch.

In the library level it should not be that hard to combine fuzzying + dynamic analysis (as long as the code has no dependencies which would require mocks, tests are not run on-target on “small” scale embedded systems, etc.). Especially if you do not consider integration with ROS1 catkin. In the “node” level it is more challenging I guess… not only because of dependencies of a “node” under test. There will probably never be a generic one-fits-all solution.

It is the same for me as well. I am German and I write better than I speak :slight_smile: