I’ve been receiving a number of private messages about SROS, and so rather then all this valuable discussion accumulating dust in my inbox, I though I’d start a thread here for the rest of the community’s benefit.
I should think for now, Q&A about the design of SROS would also be appropriate here, as SROS is bound to change, and so archiving such topics on answers.ros.org may no yet be the best place for early developments.
So please give us shoutout or followups about SROS here so we can all stay on the same page. I’ll try and be as active as gradschool will afford, but posting here may enable someone else to respond when my inbox backs up.
The following is my current understanding on SROS workflow:
SROS keyserver first creates keypairs and certificates for the root and master nodes.
When publisher/subscriber nodes are launched for the first time, the keyserver creates keypairs and certificates for them. Subsequently, nodes send their certificates to the master and master signs/verifies the certificates and registers the nodes.
Finally, these nodes can use these master-verified certificates to create a TLS channel between them and have an encrypted communication.
I hope you could answer a few questions on SROS for me.
Do you assume a one-to-one synchronous TLS communication between the nodes using topics for communication (similar to ROS services)?
If I understand correctly the keyserver only generates the keys, certificates for the nodes (based on the user-defined configuration, policies etc.) but it does not sign the node’s public key certificate i.e. keyserver does not act as CA for nodes. In the current SROS implementation, does the rosmaster register the node and sign its certificate acting as a CA when a node connects to the rosmaster for the first time?
If yes, does the rosmaster authenticate or check the node’s authorization rights before it registers the node and verifies its certificate?
In your ROScon slides (http://roscon.ros.org/2016/presentations/sros.pdf) , you mention in your ToDo slide (Slide #11) “Harden Master and Slave API calls where caller’s privilege must be checked before response”. Have you already dealt with modifying ROS API functions in SROS to include an authentication token?
I am thinking of an SROS extension which includes modifying the Master API call (register call) from a node to the master, to include a node-authentication-token. This would mean that the node registration would be successful at the master only if this token is verified correctly. In addition to your answers to the above questions, it would be great to know your thoughts on this idea.
Great questions! I’ll try to answer each in turn, but first let me clarify some possible misnomers.
If we are starting from scratch, then:
Upon starting the SROS keyserver, it would supposedly have note initialized keystore, and thus would generate some certificate authorities (CA) as specified by the default config. You could also give it a CA to start with or tell it to make an intermediate CA with it as well. By default currently it’ll generate a root CA, and then use that to make an intermediate mater CA. The default config tells the keyserver to reserve the master CA for ROS nodes, but again this is all stuff you could reconfigure or customize. However, this master certificate for CAs are not the same ones that are used by say the ROS master node from transport. Giving a certificate CA and transport use-permissions would be dangerous, so we always separate the roles of CA and trasport certs. Thats why in the ROSCon demo, the tree command listed two master certs, on in the keystore’s CApath, and another in the master node’s individual nodestore.
If you then ran sroscore, then the rosmaster, roslaunch, and roslog nodes would start and connect to the keyserver before they could talk to each other. Node request certificate and key pairs from from the keyserver, BUT they do not do this by sending a whole certificate signing request (CSR). With the intent of making the modifications to the ROS client library as light as possible to support SROS, I chose to not make the client responsible for generating the CSR, as this would bring a host of other dependencies for PKI generating into client library itself.
Instead, I deferred that responsibility to the keyserver, as only it is fully aware of the policy profile context necessary for embedding the access control extensions into the X.509 certificate. If nodes did submit CSRs, then we’d have to also enable the keyserver the ability to scrutinize those CSRs for permission escalations, just like your web certificate authority provider would do if you were requesting a SSL cert for a https domain. Rather than doing all of that, the node simply provides its ROSgraph own domain name, and the keyserver uses this to lookup the applicable certificate recipe.
Now you may ask yourself, how does this solve the chicken or the egg problem of establishing trust if any old node can ask the keyserver using any old namespace? Well it doesn’t, but we provide some methods to do this. First is that the user can specify secretes (tokens you might say) to the keyserver (via shell environment) to use to cipher private keys for a given node namespace, then requesting nodes should also know that secret by the user (via handcuffed briefcase or ssh etc.) at runtime so the node can decrypt its own received private key.
Another way is to enable the keyservers client-verification mode it uses for TLS connections from ROS nodes. So this would require nodes to use an intermediate certificate (signed from a CA that the keyserver would trust) to connect to the keyserver at all. However the really shouldn’t be running your keyserver on an exposed network to begin with.
Once the nodes have the keys and certs, then upon their p2p connection they verify each other’s certificates. However, the signature does not necessarily need to be from a “master” CA, it just needs to be from a CA that is already trusted by being in the CApath for the keystore of the verifying node. By checking that the cert signatures are from trusted CAs within their own CA path, this would allow for a multiple of other schemes such as signing certain subgraphs of the ROS network with different levels of CAs, or giving only certain CA public certs to a limited set CApath for special node keystore.
Well because TLS is sort of a stateful connection, there is a handshaking process, this requires the use of TCP. So essentially all topic, service and parameter level API call in SROS currently use TLS/TCP for the network layer. I am not fully aware of the best methods to do this for UDP based connections, but perhaps someone else might want to interject here about that.
Currently the keyserver does in fact fulfill that role of signing, as it uses the CAs you give it to sign the public certificates it generates. When nodes connect to the master node, they will already have had their certificates properly signed, otherwise the master node would just reject the TLS connection at the handshake stage. However, the master does indeed check that the node is authorised to register or deregister a topic (among other API actions) by scrutinizing the access control extensions in the certificate of the TLS connection.
This follows from what I mentioned for the previous question, in that nodes, including the master node, have validators that are used to check if the other nodes is authorised to serve or call a given API function. You can see some of this in the source code here:
So I’ve added basic checks to most of the API, but there is the bigger todo/question as to how we should safe gard the rest of the less abvious API calls. Personally I think defining a OID for the remaining API functions and checking for it presence in a certificate might be a good way for designating permission for their execution or calling. However I’d like to avoid getting too verbose and overloading the certificate payload with so many rules. So perhaps we could partition the API call into categories and alot OIDs to those groupings (like some call only a master can serve, and some only a slave can call).
The other question is whether we should filter or censor certain API function returns. For instance, some API calls are used by node to ascertain the existence of a parameter. Here is one such example where the master considers the requesting node’s permissions and redacts any permeameter namespaces the node was not authorised to read from:
Hmm… well what has been implemented is a bit different from what you mentioned. The way I’ve approached this so far is to try and have minimal impact to the existing ROS API (other that to enforce access control policies). I get away with this by leveraging the context of the IP socket connection to ascertain the certificate credentials there. By piggybacking on the transport security layer, I don’t need to exchange additional meta information that would necessitate extending the existing ROS API, and I also get the added bonus in the assurance that the PKI elements (the X.509 certificate) is valid (in trusted signature, expiration time, revocation etc.), so I don’t need to verify the policy information again.
I think it would be best to keep the policy definitions somewhat autonomous, in the fact that every node/identity is branded with the policies it was given at birth like branded cattle, so that anyone who later encounters the node, wondering out in the wild, and can interpret the policy/branding will know how to handle it, avoiding the need for some centralized policy lookup authority to recognize what the connecting peer can do. I think keeping in the realm of PKI would also be good for transitioning to DDS security features (that also use PKI) for ROS2.
I’ve been debating with some folks about the best method circulating policy restrictions in SROS for enforcing access control. The two of them so far are:
I’d like to invite the rest of the community to put forth their own opinion, and so I have started a short wiki entry expanding upon the approaches. Please feel free to reply with your remarks here and/or concisely clarify the comparison on the wiki as you see them:
I was watching some talks on the developing TLS 1.3 protocol , just checking on what’s been happening in recent drafts, and noticed that the authentication step in the connection handshake is now encrypted. Here is another video slightly more introductory .
This is really cool as it can bring privacy to certificate extensions, preventing say a client’s access policy being revealed to a passive attacker. I’m not yet sure if I understand to what this extends to the server’s certificate, or for active attackers. For that I may have to follow the mailing list discussions  more closely or check out a current implementation.
Perhaps with TLS 1.3, this might void some of my remarks on the potential drawbacks I discussed earlier about SROS’s use of pigging backing on the transport layer encryption. Also, the reduced number of round trips would also help improve the connection time between SROS nodes.