ROS2 Security: CLI tools

Hi all, my team at Amazon has been working on security tools for ROS2 (see related thread), and as part of that effort we’d like to help with adding support for secure CLI.

Security-enabled ros2cli

  • What does this mean?

In essence, being able to create keys & define a permission policy for ros2cli, enabling all CLI commands (such as ros2 topic echo/pub, ros2 param set/get, as well as ros2 bag) to operate with those permissions and using the supported DDS security plugins (authentication, encryption, access control).

  • Why isn’t it supported today?

Part of the issue lies with the fact that the CLI tool starts up different nodes using different names, and sometimes those names are only determined at runtime (e.g. due to appending the process ID to the name). The security directory lookup is done by the node’s name, overall making it unfeasible to generate and use keys & security settings for CLI nodes.

Approaches to enabling security for ros2cli

We have investigated a few possible solutions and have a proof of concept working for one (Option C). We’d appreciate your input before proceeding.

  1. Option A: Consistent node naming, known at compile time. This essentially means having all CLI nodes use the same name (for example, “ros2cli_node”). This is the simplest approach; however, while non-unique node names should be possible with DDS, ROS2 makes no such guarantee and empirically this has been shown to cause problems [1].

  2. Option B: External override to the node’s security directory. This is the approach proposed by ruffsl at https://github.com/ros2/sros2/issues/69 and it involves allowing a special environment variable to override the security directory lookup for the node.
    Usage could look something like,

    • ROS_SECURITY_NODE_DIRECTORY=~/ros2cli_keys ros2 topic echo /sometopic
      or potentially having the ros2cli tool supply that environment variable when invoked with a special flag.

    Overall this is a good approach, but it relies on environment variables which could be error-prone and introduces the minor inconvenience of having to invoke the CLI tool in a different manner.

  3. Option C: Directory lookup based on longest-prefix match rather than an exact match, and have all CLI nodes use the same predetermined prefix. Thus when a node named _ros2cli_node_12498 starts up, it would settle for the directory _ros2cli_node if it can’t find any better match.

    From the user’s perspective, usage would stay exactly the same. Since CLI nodes would start with the same prefix, say _ros2cli_node, you could simply setup security for all CLI nodes by using that prefix, e.g.:
    ros2 security create_key <key store> _ros2cli_node

    • Summary of code changes needed for Option C
      • rcl: longest prefix matching here
      • ros2cli: use the ros2cli_node prefix when starting up nodes by changing the following
      • rosbag2: Pass in the prefix from ros2cli all the way down to here

Thoughts on the different approaches discussed? Is there a potential problem we missed? Is there a fourth option worth considering?

Hidden node feature

Currently, the subscriber node (the one used for ros2 topic echo ) is started as “hidden” (underscore prefix). If we keep that discrepancy, with Option C users would need to generate two sets of keys & policies:

  • ros2 security create_key <key store> _ros2cli_node to grant permissions to hidden CLI nodes
  • ros2 security create_key <key store> ros2cli_node to grant permissions to all other CLI nodes

This obviously complicates the setup, but regardless - What’s the use case for that discrepancy between the CLI nodes? Is that a discrepancy we’d want to keep? in ROS1, CLI nodes are all visible as far as I know. Considering that this functionality is at the CLI layer and can be easily ignored with --all, I wonder whether that discrepancy is something worth keeping.

We’d greatly appreciate any thoughts you might have on the various options presented, as well as on the matter of aligning the CLI nodes’ prefix.

Thanks!
Avishay

2 Likes

If the ros2cli tool supplies the environment variable itself when it runs, then that removes the error-prone aspect and the inconvenience for the user. The user wouldn’t even need to know it was happening.

I got the impression from another post of @ruffsl’s that any kind of wildcard matching when doing security policies was a Bad Idea[tm].

This sounds to me more like an oversight in the ros2cli tools implementation than an intentional design choice.

I am wondering what the goal of making ros2cli “security-enabled” is - maybe you can clarify?

If you grant e.g. the ros2 topic echo command access to subscribe to a specific topic what make this permission specific to the cli tool? Isn’t that conceptionally equivalent of allowing “everyone” to subscribe to that topic. If not, any other entity could simply invoke the command line tool and “get” the information from there since the tool is designed to “output” the requested information.

So I would argue that instead of giving ros2cli any kind of explicit permission you could grant the same permission to any entity (which would make the need to identify the command line tool obsolete.

Thank you for all your comments!

Unless I’m missing something, you’d still have to specify the override directory, so the best we could do with that approach would probably look like:
ros2 <verb> <action> --sdir="~/cli_sec_dir"

I’d like to understand what was this based on. ROS2 Security is not really reliant on naming - ROS nodes can change their name and decide to use w/e name they want. It relies on having a set of keys & signed permissions files locally accessible by the node. The way it performs dir lookup is a minor implementation detail. To minimize confusion or unexpected behavior, the node could print a log message that says which security directory was loaded.

We assume that most production ROS2 systems would use ROS2 Security. Not necessarily right from the start, but that’s the direction it’s going towards. With that in mind - we thought that when operating a production system, having the ability to use the CLI tools would be extremely useful in investigating & resolving issues. Does that make sense?

For example: right now, if I have access control & encryption enabled, I can’t use rosbag2 to record node communication without creating permissions explicitly for that node. Then say I want to do ros2 topic pub, I would need to generate a new set for the CLI publisher. I may want to run ros2 param get but once again I’d need to generate a new set for the CLI service node. I wouldn’t be able to ros2 topic echo at all because the name is not known in advance.

If you grant somenode permissions, what makes those permissions specific to somenode? Precisely the fact that the somenode executable has the platform-specific permissions to access and read the locally-stored keys & permissions files.
As an extension of that, a developer working on ROS2 systems would want to differentiate between development machine setup and production setup. For example:

  • Development setup: generate ros2cli_node security dir and have it accessible to assist throughout development (alternatively, turn off security when troubleshooting).
  • Production setup: do not deploy ros2cli_node security dir to the system. When there’s an issue on a specific robot, deploy it to that robot in order to use the CLI tool for the investigation (alternatively, have it deployed but with locked down file permissions, depending on the level of security you need).

I’m not sure what did you mean here, could you elaborate? To clarify, ros2cli would not get any special treatment. The user would still need to generate keys & permissions for it.

It was this post in the context of the actions design discussion.

I see that he was referring to topics and you are talking about directories on the file system, so probably it’s not relevant?

A little late to the party here, but I’ll try and touch on all points raised thus far.

In regards to Option A, I agree deterministic node naming prior to runtime would simplify a lot of the use cases presented, however I think the added complexity in dealing with non-unique node names in ROS2 would be a non-starter.

That being said, I think it’s about time we should start to contemplate contingency strategies for routing with non-unique node names when dealing with multi robot systems. E.g. how can ROS2 be made conscious of or distinguish duplicate nodes of common names running on seperate robots in a homogeneous swarm system, sharing the same DDS domain. The subject of multi robot namespacing is perhaps a whole new world of issues I think that warrants its own security thread, and was an item we touched on in the Future Work section of Procedurally Provisioned Access Control for Robotic Systems.

As you’ve linked with Option B, ros2/sros2/issues/69, I’m presently a proponent of extending the environment to enable the user to override or specify the exact security artifacts to be loaded. In time, with support of added sources of secure storage like Trusted Platform Module (TPM), I suspect such configuration options to expand. For now, I think the following PRs strike a decent balance between configuration generality and specificity:

With regards to Option C, I don’t think the strategy of directory lookup based on longest-prefix match would be wise. Rather than an exact match lookup, using longest-prefix match could lead to unanticipated collisions or loading of unintended security credentials. Personally from a user perspective, I’d prefer that nodes either load the exact security artifacts I point to or none at all and error out; rather than silently finding an applicable match elsewhere at runtime.

For generating security credentials however, I would be more receptive to using more flexible logic in matching nodes to permissions. In fact this was something I did back in SROS1 where a keyserver was used to provision permission certificates to nodes. In newer profile language for ROS2 in development, ComArmor, I’ve tried to refine this idea of lists of attachment expressions, evaluatable via regex or fnmatch.

Again, I think this would be a good case for leveraging the environment during debugging and development, as I can specify all nodes spawned by the invoked command to utilise the exact security artifacts I set. This might become tricky if I’m using something like ros2 launch to start my nodes with unique credentials, and the nodes themselves have dynamic node names. At that point, you could modify your .launch.py to pass in the appropriate environment for each node, or remap the node name via run args to be deterministic.

In regards to hidden nodes or hidden topics, I think that kind of meta information would be nice to embed into the discovery or data tag information, rather than just '_'ed namespaces. This would be useful if one wanted multiple or granular levels of visibility, not just a binary flag; E.g. e.g. {DDSTopicName:”rt/movit/log”, DataTag:[{key:”ros2.subsystem.visibility”, value:”level.debug”}]} vs {DDSTopicName:”rt/movit/_log”}

The goal is to ensure that ros2cli tooling remains function when security is enabled; at least to the degree capable in accordance with the permission granted. As for what permissions that are provisioned, and who is given those security artifacts is between you and your friendly sysadmin :hammer_and_wrench: . I’ll paint a scenario to illustrate the use case.

Say some end developer, Bob :construction_worker_man:, orders a black box robot :robot: by some OEM, ACME :factory: . ACME is in possession of both the Permissions and Identity Certificate Authorities (CAs), and continuously controls what artifacts are installed/deployed onto the black box robot. ACME, like any CA provider, also has a online portal where Bob can send certificates request and permissions for signing. This allows Bob to meditate who is authorized to communicate with the robot, while ACME can additionally regulate what permissions are granted.

Assuming the CAs are wise enough to only sign certificate requests with unique subject names, and access to security artifacts on the robot are controlled (via TPMs, ARM Trust Zone, Intel SGX, etc.), this would allow ACME to protect system critical interfaces in the robot, while simultaneously provisioning Bob with the minimal subset of permission necessary for developing with the product. E.g. enabling Bob to ros topic echo sensor data :camera: for calibration with the end user app, but not screw with factory set PID gains params in the safety controller :fire: .

Later, if Bob needs to take the robot to the local certified shop :toolbox: for repair, ACME could provision the shop with the admin level permissions to debug dagious or system critical interfaces. ACME would also take care to only issue such admin level permissions with a validity set expire after the expected duration for repair.

Ideally, I’d like afford Bob to be in possession of the Identity CA, since Bob is going to be held responsible for the robot in the end application, but this would require some alterations to the binding between permissions and identity certs in the default Secure DDS plugins. Presently this is via matching subject names fields, but could be customized to bind to something else like a cert’s public key. In any case, this paradigm will probably work against the Right to Repair, a movement I concur with; yet this is a service model I’ve seen quite a few robotics OEMs request for, and expect to see more of in the future.

I’m not sure I caught that either, but I’m guessing @dirk-thomas is asking whats the points of granting ros2cli permissions and credentials, if any old application could use those same security artifacts, e.g the transport layer security artifacts are not inherently bound to a specific application layer.

To that I’d argue that even though nothing inherently restricts one set of security artifacts from be used for a different purpose, aside from controlling access to those security artifacts, the benefit of selectly specifying the security artifacts for ros2cli tooling to use vs the rest of my ros2 application helps development. In the scenario above, I might want to have my ros2cli on my workstation to have super admin permissions (* for subscribe and publish permissions) so I can triage whether its the topic publisher or the subscriber in the robot that has insufficient privileges.

An alternative might be to support ros2 run like args in more general ros2clis, like:

ros2 topic echo /chatter __ns:/ __node:listener

:+1: I would also appreciate a log entry for location of the resolved security directory used. That’d make debugging missing or invalid security artifacts easier, knowing exactly where the node was looking first.

Yes, that was my little rant against core ros2 subsystems embedding uncontrollable dynamic strings in namespaced resources, making static access control infeasible; thus necessitating the use of wild card expressions for even basic features like actions. My cautionary stance towards longest-prefix matching lookup remains, but geven node names in general are controllable/accountable its not as bad as embedding a GUID/PID number in a object namespace.

Thanks for the detailed responses @ruffsl :slight_smile:
Thinking about it some more, I don’t think options B and C contradict each other; I think we should implement both.
So if the override is specified, we would use that. Otherwise, we’d prefix match. That would allow a variety of use cases while still being fairly simple. Of course, we need to make sure to log the exact security directory loaded, and document the lookup mechanism appropriately.
I’ll have a PR out soon.

We have submitted the relevant pull requests, see: