Are there any out standing issues that need to be addressed in Software Quality of the ROS?

Would like to know if there any open issues in ROS quality that can be addressed through research.

Can you post more context here? “Research” is a pretty broad term. Also are you interested in ROS, ROS 2, ROS core packages, or other packages?

1 Like

I am more interested towards contributing to ROS and ROS2 software quality and security. So I am looking for some research gaps that already exists so that I can start contributing in that direction.

Can you provide more context to the readers? Like your background and interest? How long you plan to work on the project? What sort of experience you have with ROS? These questions are going to come up. A single line post isn’t particularly informative.

3 Likes

Actually my background is related to research in program analysis and software testing. I have recently joined a 3 years PhD programs and my topic is related to developing static analysis tools for robotics systems.

I would like to contribute and dig more into the practices that are followed to secure robotics systems that use ROS. Currently I am at beginner level and just trying to understand the literature related to ROS and ROS2 and found that there are various issues with respect to security in this area. So would like to explore these issues and provide a solution to mitigate these issues in early process of robotic system development.

2 Likes

I have a background in information security and static analysis, but my time in robotics is short.

As I see it, the main challenge to static and dynamic analysis tooling for ROS is that it uses a multi-process architecture.

No existing static analysis tool that I know of is going to be able to trace execution across a process boundary. Many of them struggle to do so across translation units.

Because within a ROS program the topic names are runtime strings and often dynamic, it is not easy to statically analyze a codebase and understand which components talk to which others.

Therefore I think the most promising techniques would be centered around dynamic analysis.


Some project ideas:

  1. Identify a common bug that affects ROS developers, and then figure out how to create or adapt a tool to find instances of that bug automatically.

  2. Prototype better APIs for writing nodes that would avoid common issues. I think the work that some have done in integrating state machines is interesting here in taming complexity.

  3. Build ROS core packages with Address Sanitizer, Thread Sanitizer, etc. and run their test suites. Do they have any bugs?

  4. Contribute ASan/TSan support to the ROS build tools like catkin, the buildfarm, etc. so that the community can more easily test their code for these issues.

  5. Write a fuzzer for ROS messages to try to make nodes crash.

  6. Add a layer of cryptography for the network packets sent around by ROS. Imagine you have multiple robots that communicate with one another wirelessly, and one is captured. Can you prevent the encryption from being compromised?

1 Like

Welcome @tareq97, glad to have you in the community researching security. There’s the Security WG that you may want to consider joining. That’s probably the best place where to learn what’s most needed and how you can contribute with the ROS community.

Also, it’d help you get up to speed with the current status before you define your research direction. The security WG is focused on ROS 2 efforts and that’s what I’d recommend you to consider as well.

@rgov provides some interesting feedback, but I’d consider the following:

This is not necessarily true in my experience. There’re different efforts trying to capture the complexity of the computational graph. Refer to @afsantos’ work GitHub - git-afsantos/haros: H(igh) A(ssurance) ROS - Static analysis of ROS application code..

This was already done in the past (not implying you shouln’t research in this direction, but I’d encourage you to build on top of what’s existing and either extend it or contribute finding new flaws/implementing mitigations). Various pieces were discussed/announced in this same forum and you should be able to find them easily.

1 Like

I had read one report about instrumenting ROS with ASAN but the conclusion of the “paper” was basically a rant about how hard it is to compile ROS! I’d be interested in reading about other attempts.

well, it’s not a paper, but what about Introducing ROS2 Sanitizer Report and Analysis?

Thanks for mentioning HAROS, @vmayoral !

I’ve recently concluded my thesis (still waiting for my public examination), so HAROS is now reaching the end of a milestone, and aiming for several improvements as a tool in the future.
Not many people are aware of the most recent developments of HAROS, so I’ll leave here a summary and some research directions.

Warning: long post; if you’d rather read a short paper, we have one fresh out the oven (accepted, still in press) here: [2103.01603] The High-Assurance ROS Framework

Right now, HAROS is used as a static analysis tool for:

  1. measuring internal code quality (coding standards, metrics, what most people know/use HAROS for);
  2. extracting architectural models (i.e., the ROS computation graph) and verifying properties over the architecture;
  3. annotating models (we call them configurations) with behavioural properties.

These behavioural properties use a small, pattern-based, message-oriented language that I am developing. E.g.:

globally:
  /cmd_vel {linear.x = 0 and angular.z = 0}
  requires
  (/bumper {state = PRESSED} or /wheel_drop {state = DROPPED})
  within 200 ms

This example means

If I see a message on /cmd_vel such that linear.x = 0 and angular.z = 0, then I must have observed either a message on /bumper such that state = PRESSED or a message on /wheel_drop such that state = DROPPED, in the 200 milliseconds immediately preceeding the /cmd_vel message.

I have implemented (and am currently refining) plug-ins that take these properties and automatically generate runtime monitors and property-based tests.
An MSc student of ours implemented another plug-in that fed these properties to a model checker (Electrum). However, the plug-in is currently outdated, and I am working on getting it up-to-date.

We have MSc students currently working on:

  • integrating variational analyses into HAROS (i.e., analysing families of products at once, instead of one configuration/launch file at a time);
  • experimenting with software model checkers (CBMC) to verify the behavioural properties directly in the code;
  • experimenting with other model checkers (SPIN, TLA+ and Z3, specifically).

There are a number of things that could be done/improved here.
Some examples:

  • Develop equivalent methods for ROS2. The replacement of launch files with python scripts is a challenge; I suspect a mix of static and dynamic analyses will be the best bet.
  • Static binary analysis, i.e., extracting publishers, subscribers, etc., from pre-compiled binaries, instead of source code (some reverse engineering tools, like radare2, are promising, but I do not have the time to invest).
  • Extract the message flow from the source code, i.e., the architectural models would know that a node does not simply advertise a topic, but also only publishes on it at a given frequency, or reactively in callbacks (this is something I might do, if nobody else picks it up).
  • If the previous point is implemented, then why not infer behavioural properties while we’re at it (globally: /input causes /reactive_msg)?
  • If the message flow is integrated into the model, we can effectively trace execution across process boundaries, @rgov .
  • Develop simple static analysis tools that catch nonsense code patterns (I’ve seen many, and have archived a few that I can share). E.g., creating one thread to process callbacks, while the main thread stays idle.
ros::AsyncSpinner spinner(1);
spinner.start();
while (!gShutdownRequest) { ros::Duration(0.05).sleep(); }

This pattern can be replaced with:

ros::spin();

Since @tareq97 's background seems to be partially in software testing, you could also improve on our existing analysis technique.
We’ve recently tested a path planner for agricultural robots (see the report), and I’ve noticed many details that are currently cumbersome, if the properties involve images, maps, or a lot of numeric calculations.
Alternatively, there’s also this idea of using Gazebo in headless mode under the hood, to test full systems (as opposed to pure-software nodes) in specific scenarios.

Hope this helped!

1 Like

Yes, exactly. The ASAN and TSAN jobs that are mentioned in that post are still running, and we do look at the output and try to keep them green. I will note that those jobs only test a small subset of the ROS 2 core currently; expanding that to test more of the core is something we’d love to have.

But that is just talking about the ROS 2 core. The other thing we’d love to have is ASAN and TSAN jobs on https://build.ros2.org that any package maintainer can opt-in to. However, there are practical limitations on the number of jobs we can support in Jenkins.

In any case, none of this is “research”, as such, but is all necessary to have a mature and stable core.

Sorry for the late reply I had some personal thing to take for in last few day so I didn’t reply. Thanks @rgov @vmayoral @clalancette @gavanderhoorn for initiating the discussion this thread.

Also @afsantos thanks for sharing the details on the HAROS tool and the improvement that can be done in it.

Currently I am looking into the issues in ROS related Github repos to find some patterns on the code fixes and to write these patterns as part of the static code analysis tool. I was intrigued by the point of nonsense code patterns in the code as this can also be considered while writing the rules for the static analysis tool. It would be interesting if @afsantos could share these patterns. Currently I was only looking for the patterns in related to issues that are already fixed but it would be good to explore the redundant code patterns in the ROS repos.

Also I am very much interested in the other ideas too that are shared but I am currently am new to ROS and trying to grasp as much as I can on it. So I want to start focusing on something that I can achieve.

Thanks for sharing the all the research related documents.

@tareq97 I opened a new thread to share some code patterns, so that this thread does not go off-topic.

Hi @rgov

Can you please share the report or the research paper related to instrumenting ROS with ASAN. It would be an interesting read.

Hi, I couldn’t find it again after a brief search. But the report was not very in depth, and mostly complained about the difficulty of compiling ROS from scratch :slight_smile: The link from @gavanderhoorn looks more thorough.