Safety-critical WG

Some time ago I was asked to lead a working group looking at the use of ROS 2 in safety-critical systems. These are systems that may potentially cause harm to people or the environment, and I think that most of us agree that a large number of robot applications fall into this category.

The working group will look at topics including:

  • Documenting how to use ROS 2 in a safety-critical application
  • Use of tools to support the above
  • Additional processes, tools and methods needed for building a safety-critical robot that are not currently covered by something in ROS but could be
  • How to make the client libraries usable in a safety-critical system, and work on safety-focused client libraries (for example, a SPARK client library)
  • Cross-over issues with the QA and real-time working groups for infrastructure, tooling and methods
  • Cross-over issues with the navigation and manipulation working groups for sample applications
  • Anything else safety-related someone brings along

In the interests of getting this thing moving rather than stalling while I try to get a nice, formal proposal together, I’m going to begin with just a call for participation and a time for the first meeting.

ROS 2 Safety-critical WG

San Francisco, USA Wed, 8 May 2019 at 15:00 PDT
Chicago, USA Wed, 8 May 2019 at 17:00 CDT
Washington DC, USA Wed, 8 May 2019 at 18:00 EDT
Barcelona, Spain Thu, 9 May 2019 at 00:00 CEST
Berlin, Germany Thu, 9 May 2019 at 00:00 CEST
Tokyo, Japan Thu, 9 May 2019 at 07:00 JST
Corresponding UTC Wed, 8 May 2019 at 22:00

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/206092957

You can also dial in using your phone.
United States: +1 (312) 757-3117

Access Code: 206-092-957

More phone numbers
Australia: +61 2 9091 7603
Austria: +43 7 2081 5337
Belgium: +32 28 93 7002
Canada: +1 (647) 497-9373
Denmark: +45 32 72 03 69
Finland: +358 942 72 0972
France: +33 187 210 241
Germany: +49 692 5736 7300
Ireland: +353 15 295 146
Italy: +39 0 230 57 81 80
Netherlands: +31 202 251 001
New Zealand: +64 9 913 2226
Norway: +47 21 93 37 37
Spain: +34 932 75 1230
Sweden: +46 853 527 818
Switzerland: +41 225 4599 60
United Kingdom: +44 20 3713 5011

New to GoToMeeting? Get the app now and be ready when your first meeting starts:
https://global.gotomeeting.com/install/206092957

One of the things to discuss in that meeting will be the time for and frequency of regular meetings after that, as well as mundane topics like where to keep information.

1 Like

As it turns out, safety is not that interesting a topic for most people. Who would’ve guessed? :wink:

It was just myself and Nick Burek from AWS in the meeting today, but we did throw around a few ideas for things the working group could do.

  • The most low-hanging fruit identified was to put some resources towards the Rust client library. Rust is a powerful systems programming language but it also has features that make it useful for safety-critical and real-time systems. Esteve’s client library exists but more work needs to be done and currently not enough resources are going into it.
  • Related to the above would be to get the SPARK client library into a usable shape and make it feature complete. SPARK is a variant of Ada designed for dependable systems. We discussed how a minimal SPARK client library could be useful for the safety-critical parts of a robot, especially the lower-level parts such as a monitor.
  • There are many verification tools that would be useful to have supported by colcon in the same way linters are. Thought needs to be given to exactly what tools would be useful, with an eye to ones that are going to work well in CI. Design verification tools like TLA+ and Spin are possibly going to be less useful in this situation than implementation verification tools like symbolic execution tools like KLEE.
  • More abstractedly, we talked about how it would be useful to have formal specifications of nodes in terms of outputs produced from inputs (think of something like a disjoint transfer function) which could be used to generate test cases that are representative of large input spaces without needing to run huge numbers of tests, helping with combinatorial testing of nodes.
1 Like

To clarify: the build tool colcon knows nothing about linters. They are being invoked from the build system in either CMake as CTests or in Python packages as unit tests.

Sorry, I mixed that up. I meant to say ament, like the existing ament extensions.

I wanted to be there but the time zone and other problems prevented me from attending :confused:

I’m still working in the Ada client lib (updating it to Crystal atm). I have little SPARK expertise but it’s definitely one topic in which I want to invest time, so I’ll try to get involved in that part if it ever moves forward.

If the Ada client library gets completed then that would provide a good base to build a SPARK one on.

Geoff, sorry we missed the first meeting, we’d like to participate.

@mjeronimo and @lbegani are looking at safety from the Intel ROS2 team.

2 Likes

OK, since there have been a few more expressions of support now, let’s take another stab at having a kick-off meeting.

Let me know your availability through the poll below and we will see if we can find a meeting time that sucks for as few people as possible.

1 Like

Thanks to all those who filled in their preferred times. We will meet at this time:

Tuesday, June 4, 2019 2:00 PM.

https://www.timeanddate.com/worldclock/fixedtime.html?msg=ROS+2+Safety+WG+kickoff+%232&iso=20190604T23&p1=248&ah=1

If you want a calendar invite, please let me know your email address by PM.


ROS 2 TSC Safety WG kickoff #2

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/205619637

You can also dial in using your phone.
United States: +1 (646) 749-3129

Access Code: 205-619-637

More phone numbers
Australia: +61 2 8355 1050
Austria: +43 7 2081 5427
Belgium: +32 28 93 7018
Canada: +1 (647) 497-9391
Denmark: +45 32 72 03 82
Finland: +358 923 17 0568
France: +33 170 950 594
Germany: +49 692 5736 7317
Ireland: +353 15 360 728
Italy: +39 0 230 57 81 42
Netherlands: +31 207 941 377
New Zealand: +64 9 280 6302
Norway: +47 21 93 37 51
Spain: +34 932 75 2004
Sweden: +46 853 527 836
Switzerland: +41 225 4599 78
United Kingdom: +44 330 221 0088

New to GoToMeeting? Get the app now and be ready when your first meeting starts:
https://global.gotomeeting.com/install/205619637

1 Like

Hello @gbiggs ,

I’m trying to log in but getting

Time seems right though:

1 Like

I’m seeing the same, so you’re not alone.

1 Like

Sorry, everyone. Various problems compounded to make me 35 minutes late for the meeting, by which time most people had given up, it appears. Based on the previous poll, I’d like to reschedule for the following time:

Wednesday, June 5, 2019 2:00 PM.

https://www.timeanddate.com/worldclock/fixedtime.html?msg=ROS+2+Safety+WG+kickoff+%232&iso=20190605T23&p1=248&ah=1

I will post connection information tomorrow. Please let me know as soon as possible if this time doesn’t work for you.

1 Like

Here’s the connection information.

ROS 2 TSC Safety WG kickoff #2a
Wed, 5 Jun 2019 23:00 - 00:00 JST

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/282071125

You can also dial in using your phone.
United States: +1 (646) 749-3129

Access Code: 282-071-125

More phone numbers
Australia: +61 2 8355 1050
Austria: +43 7 2081 5427
Belgium: +32 28 93 7018
Canada: +1 (647) 497-9391
Denmark: +45 32 72 03 82
Finland: +358 923 17 0568
France: +33 170 950 594
Germany: +49 692 5736 7317
Ireland: +353 16 572 651
Italy: +39 0 230 57 81 42
Netherlands: +31 202 251 017
New Zealand: +64 9 280 6302
Norway: +47 21 93 37 51
Spain: +34 932 75 2004
Sweden: +46 853 527 827
Switzerland: +41 225 4599 78
United Kingdom: +44 20 3713 5028

New to GoToMeeting? Get the app now and be ready when your first meeting starts:
https://global.gotomeeting.com/install/282071125

Thanks for re-organizing @gbiggs, unfortunately I won’t be able to make it today but will look through the notes if they become available :slight_smile:

Thank you to those who joined the meeting. We had a good discussion and identified some concrete areas where we can take action. We also identified that our biggest roadblock is, as always, resources.

Here are my notes from the meeting.

Participants

  • Alejandro Mosteo, author of the Ada client library. Working with drones, and don’t have a specific safety need but it is something that is of interest. Will continue maintaining the Ada client library and interested in SPARK.
  • Brad Baillio, working on autonomous vehicles in off-road situations like mining and agriculture. Safety is an obvious need.
  • Matt Droter, ROS Agriculture. Using ROS for farming so trying to figure out how to make it safe, what are the general best practices that can be applied.
  • Nick Burek, AWS RoboMaker. Multiple groups at Amazon are doing robotics for warehouses, so the robots are safety-critical.
  • Geoff Biggs, working on self-driving vehicles at Tier IV, where the importance of safety is obvious.

Discussion

  • Client libraries:
    • The Ada client library is working with Bouncy version of rcl, and feature complete for Bouncy. Alejandro is now going to update it to work with Dashing rcl.
    • Alejandro plans to start working with SPARK in the near future.
    • Rust is intriguing as a language that could be useful in safety, but it is not yet widely used in ROS. Working to classify Rust would be a huge job and probably beyond our capabilities.
  • Tools in the work flow useful for safety-critical development with ROS.
    • AWS work with TSAN and ASAN is now at the stage that reports are being generated in nightlies.
    • The Automated Reasoning Group at Amazon uses a tool called C Bounded Model Checker (CBMC) that
      allows you to write proofs against your code and allow you to check the whole valid range of inputs and outputs. It does C++ as well now, but is not great at multi-threaded logic. It is a good tool to find the last bugs in your code.
    • The same group is also using many other tools. Nick Burek will provide a list of tools being used.
  • ROS QA WG is working on integrating the High Assurance ROS tool into the build farm.
  • There is interest in the work that Apex.AI is doing. Are we duplicating their work, or are we trying to make their work open source? Are we competing?
  • No one is particularly interested in and/or has the necessary skills to do formal specifications of nodes.
  • Classifying tools used in ROS would be particularly valuable.
    • We could try to classify launch2 or colcon. Both would be valuable for the community.
    • We may be able to try and get Apex.AI on board for classifying launch, in particular its input specification.
  • Most of the things discussed so far are tools that support reliability. What could we do to help people make sure their designs using ROS are safe?
    • One particularly good idea is to come up with sample structures in ROS for common architecture patterns used in safety-critical systems, such as a 2oo3 architecture, or how a safety monitor should be implemented.
    • Document how callback groups and how the threading models work with different executors so that people have a guide for how not to deadlock themselves, for example.
    • Provide sample safety cases for ROS-based systems. ROS Agriculture has a small lawn tractor application, which is well-defined and has clear safety concerns. This could be a good sample application to work with. https://github.com/ros-agriculture/ros_lawn_tractor

Proposals

  • Things to attempt:
    • Document threading models used in ROS 2.
    • Classify launch, especially its input specification, to find the things you should not do with it in a safety-critical system.
    • Try CMBC and consider how it might be integrated into ament so it can be easily used in CI and the build farm.
    • Develop some sample architectures for using ROS in a safety architecture, and document them.
    • (Longer term) Produce a sample safety case for the ROS Agriculture small lawn tractor application.
  • Possible effort contributions:
    • Alejandro Mosteo can contribute some personal time, and if a student with interest comes along…
    • Brad Baillio can contribute some personal time.
    • Matt Droter can act as a conduit to the ROS QA WG to coordinate related efforts.
    • Nick Burek will try and get some time on the next sprint at AWS for trying CMBC on at least one ROS package.
    • Geoff Biggs can contribute work on sample architectures for safety-critical systems, will look into documenting the threading models, and over a longer term will work on the sample safety case with Matt.
4 Likes

Thanks for the detailed minutes, Geoff.

As for your last entry on threading models, you may find something of interest in the Ravenscar profile of Ada, which specifically targets multithreading in high-integrity contexts:

For those of you interested in formal proof of complete systems, above the source code part, this project has been making the rounds in the Ada community a time ago. I think they used Z for the user interface spec (link to github at the end):

Apologies for the spam if you were already aware.

Cheers,
Alejandro.

Thanks for that information! I’m always happy to be introduced to new samples of using Z to learn from.

I’d like to set our next meeting in the first week of July. It’s a little way off, but we will start doing more frequent meetings when we start getting more active.

Here’s a poll for the meeting time.

After this meeting, I would like to set a regular meeting schedule. I will send out a poll for that later.

1 Like

Thanks to those who provided their availability. I have chosen the time that the most people are available for, which is:

Wednesday, July 3, 2019 2:00 PM.

Sorry for the 7AM start for those on the west coast of the USA.

Here is the meeting participation information. If you want an invite, please send me a DM with your email address.


ROS 2 Safety WG

Please join my meeting from your computer, tablet or smartphone.
https://global.gotomeeting.com/join/485233997

You can also dial in using your phone.
United States: +1 (571) 317-3116

Access Code: 485-233-997

More phone numbers
Australia: +61 2 9091 7603
Austria: +43 7 2081 5337
Belgium: +32 28 93 7002
Canada: +1 (647) 497-9373
Denmark: +45 32 72 03 69
Finland: +358 942 72 0972
France: +33 187 210 241
Germany: +49 693 8098 999
Ireland: +353 15 295 146
Italy: +39 0 230 57 81 80
Netherlands: +31 202 251 001
New Zealand: +64 9 913 2226
Norway: +47 21 93 37 37
Spain: +34 932 75 1230
Sweden: +46 775 757 471
Switzerland: +41 225 4599 60
United Kingdom: +44 20 3713 5011

New to GoToMeeting? Get the app now and be ready when your first meeting starts:
https://global.gotomeeting.com/install/485233997

Dear all,

I’m truly sorry but recent developments will keep me out of touch during the Jul 2-3 period, so I will miss the meeting.

Best,
Alejandro.