IPC in ros2

@gavanderhoorn thanks for catching that. I’ve just committed the Apache 2 LICENSE file.

Please let us know if you have any questions.

Ciao!

I’ve finally found the time to read through both submissions (and just in time, to, with the OMG meeting next week!). Here are my notes that I took for myself as I was reading through them, along with some thoughts at the end.

RTIandCo submission

  • Client-server protocol to allow a resource-constrained device to interact with a DDS domain via a gateway (the “Agent” server).
  • Use of the client-server (broker) architecture is what allows the low resource usage.
  • The specification defines a simplified object model that acts as a facade to the standard DDS object model, enabling lower resource use to access a DDS domain.
  • Most DDS configuration is assumed to be doable on the agent (DDS side), so configuration options on the XRCE side are limited. This contributes to the simplified object model.
  • Access control, access rights, and managing disconnected clients are new features (over base DDS?) included in the facade object model.
  • Management of disconnected devices is handled using a session concept that persists across connections between the client and the server (e.g. when the client goes to sleep).
  • A pull mode is available for clients that do not want data coming in randomly. The client can query the object model on the server rather than changes being updated and pushed out in real-time.
  • The specification can be used for anything from extremely simple, pre-configured clients up to fully capable DDS devices (why these cannot just use DDS is not made clear).
  • The object model is resource-based: DDS-XRCE types, DataWriters, DataReaders and so on are represented as resources with a name, properties and behaviour.
    • Resource implementation is outside the scope of this document.
    • Resources may be shared or dedicated. e.g. Multiple clients might share a single DataWriter on the server.
  • Clients can only talk to each other via the DDS domain. i.e. Client 1 -> Server -> DDS domain -> Server -> Client 2. Multiple servers may also be involved.
  • Data can be sent as a single sample, a sequence of samples, either of these with metadata, or packaged data.
  • References to objects on the server can be made using a name (but it must be pre-defined?), an XML string, or a binary XCDR-serialised reference (although this is not available for all object types).
  • Clients can choose a QoS profile that is pre-defined on the server using a named reference. Or they can provide a QoS profile via DDS-XML that they wish the server to use for them. A combination of these is also possible.
  • All operations on the server are authenticated, and require a ClientKey. This is also used to identify clients.
    • Obviously authentication could be as broad as “anyone welcome”.
    • Creation and configuration of the ClientKey is out of scope (not great for interoperability).
  • Although the specification calls for authentication, it may be easy for a developer who is not careful and uses credentials widely to create clients that step on each other, messing with each other’s objects on the server.
  • In many ways this specification feels like a remote control for DDS, rather than a low-resource protocol and middleware in its own right.
  • The protocol is targetted at networks with a minimum of 40 Kbps of bandwidth, so you can give up on your 14.4 Kbps modem now.
  • A design goal of the protocol was that a complete implementation require “less than 100 KB of code”.
  • Clients absolutely cannot operate on their own; they must have the server available to function. No peer-to-peer communication is possible.
  • No vendor-neutral API is proposed.
  • The transport requirements are fairly strict. Fortunately most transports these days provide them. The requirements include:
    • Must be able to deliver messages of 64 bytes.
    • Message integrity must be guaranteed (but not reliability; messages may be dropped).
    • Must provide transport level security.
  • The protocol consists of a session, which carries one or more message streams with independent reliability settings. Each stream consists of ordered messages with sequence numbers so dropped messages can be detected and message order can be restored if the transport changes it.
  • The reliability setting of a stream is determined by the stream ID, rather than being a separate flag header or something like that. Streams with an ID in a certain range have a certain type of reliability. (Effectively the first bit of the session ID is a flag for reliable or not.)
  • Each message contains one or more sub-messages.
    • This structure reduces some resource usage, e.g. a single header can apply to many sub-messages, or a single message can operate on multiple resources on the server.
  • The payloads of most submessages are XCDR-encoded binary data.
    • The payload can be up to 32 KB.
  • Message overhead is between 8 and 12 bytes, with an additional 4 bytes for every additional sub-message.
  • The interaction model is purposely simple, allowing for pre-configuration to replace DDS’s discovery, etc. It is possible to rapidly initiate a session and begin writing data, assuming the server is available, configured correctly and connected to the DDS domain.
  • A fairly well-thought-out heartbeat system is available to maintain reliable communication.
  • The discussion of overhead should have also considered low-overhead transports such as IEEE 802.15.4-based transports. TCP may be an average case, a good case, or a bad case for relative overhead but because no data is provided it is hard to say. (My own brief research suggests that TCP is not a good choice for evaluation.) Message overhead should be compared to the commonly expected payload size rather than the transport size, since the transport used is up to the implementer.
    • Some of the arguments against reducing overhead are not strong. Reducing the number of possible stream IDs (and thus the number of possible streams) is arguably not a problem; how many streams is a small device likely to need in the common use cases? 256 seems like a lot of data for a device when the common example of a DDS-XRCE device given is “a temperature sensor”. Needing 8 bits for the sub-message type to allow future evolution of the protocol smells like aligning things on an 8 bit boundary; dropping 4 bits would certainly leave only two slots for new sub-message types, but dropping 3 would leave 18 and dropping 2 would leave 50.
    • Ultimately the message overhead discussion comes down to knowing what the use case is. Does an extra byte here or there matter that much? For ROS, possibly not.
  • Sample message sizes:
    • 30 bytes to initiate a session.
    • 13 bytes to request to read a single sample of data, followed by 15 bytes reply for the (4 byte) sample.
    • 23 bytes to request multiple samples.
    • 47 bytes to receive a sequence of two 4-byte samples with meta-data (12 bytes per sample).
  • Although XML is syntactically more exact, a more compact and easier to process representation such as JSON have been used instead. But, as noted, there is an existing DDS-XML specification so reusing it makes sense.
  • The demonstration implementation requires a microcontroller with 256 KB of RAM and running an operating system (NuttX). No demo with an OS-less microcontroller is mentioned. You won’t be running this on an Arduino.
  • The protocol is small and simple. It would be easy to implement (they state less than 2000 lines of code). It provides access to the entirety of DDS capabilities, which may be important for ROS, but it does so at the expense (in hardware and run-time costs) of needing a gateway server.

PrismTech submission

  • Despite appearing to be a more complex protocol during the presentations in September, the specification itself is half the length. Less diagrams?
  • This submission is much more formalised than the other.
  • The three main goals of this submission are extremely low footprint (an Arduino Uno is cited), extremely efficient wire protocol (overhead of just a few bytes), and supporting devices that regularly sleep.
  • This submission pays no attention to the API. It is only interested in the wire protocol.
  • Discovery is supported, and is also a separate compliance point so vendors don’t have to implement it if their target platform is too small.
    • Static configuration is possible.
  • Resources are used to represent information to be exchanged, with properties of these available. Resources are identified by a URI; the properties are always accessed via a /property postfix to the URI.
    • Reliable is the default setting.
    • Durable and transient resources are also available.
    • A query syntax that allows filtering resources is provided. For example, all resources where a data member(?) is above a given value. This is equivalent to the DDS filter expression topic subscription.
  • This submission uses an interaction model fundamentally similar to DDS, with DDS-XRCE participants reading and writing data in a data space.
    • An implementation can use a set of brokers, or a pure peer-to-peer infrastructure, or a mixture. XRCE clients can exist and function without any kind of special server.
  • The message header is a single byte, with 5 bits for message ID. This allows up to 32 message types.
    • Messages may be decorated with additional markers.
    • Variable length encoding is used for things like message length and integers.
    • Sequences and strings also have an encoding specified; XCDR is apparently not used.
  • Message payload may be any size within the limits of the transport.
  • Following discovery (or startup for a static configuration), a session is established between every pair of XRCE applications talking to each other. Part of opening a session includes ensuring that both sides can handle the same range of sequence sizes to avoid sequence number roll-over problems.
    • Sessions are kept alive as long as a message is exchanged during the specified lease period. There is a keepalive message that can be used when nothing else is sent. Both sides must actively maintain the session.
    • Sessions can exist across multiple transports, so it is possible to have multiple connections at the transport level using different transports and merge them into a single session, allowing the best transport at the time to be used (e.g. UDP for best-effort data and TCP for reliable data).
    • Multiple sessions cannot exist on the same connection because sessions are uniquely identified by the locator (i.e. address of the client). However since multiple readers and writers can exist within a single session this is not a significant limitation.
  • Authentication is included in the protocol, but the details are left up to the implementation.
  • After establishing a session, resources can be created using special messages. An atomic approach is supported, with all resources being requested and then a final commit message being sent to actually trigger their creation.
  • Data samples can be sent singly, in a stream, or in batches.
  • Data can be pulled or pushed.
  • Data fragmentation is supported allowing samples of arbitrary size.
  • There is a message available for round-trip latency estimation.
  • It is not clear how sleep cycles combine with the peer-to-peer operation mode. If one client sleeps, then wakes up and asks for data from another client (which it couldn’t receive earlier due to being asleep) but the publisher of that data is asleep, the system will deadlock.
  • Sample message sizes:
    • 3 bytes for discovery probe.
    • 4 bytes plus data size for a data sample.

My thoughts

The PrismTech submission is undoubtedly more complex, but it is also undoubtedly more powerful - although how much more depends on your use case. Most significantly, it supports discovery and DDS-XRCE applications do not need a server running to communicate even amongst themselves. The RTIandCo submission, on the other hand, is simpler but does not support any form of P2P communication, requiring a server to always exist even if you only have DDS-XRCE applications. Both would require some kind of gateway (which is explitly present in the RTIandCo submission) to talk to DDS-RTPS, but while the PrismTech one would require the data to be unpacked and repacked, the RTIandCo one probably would not because it uses XCDR for DDS-XRCE.

The PrismTech submission is superior for tiny-scale devices. There are many examples of these in use today, such as sensor motes. But for the ROS use case, are such tiny devices relevant? Regarding which is more suitable for ROS, this is not a straightforward question. PrismTech’s submission is more suited to implementing ROS on top of as a standalone rmw implementation because it would not require that a server always be present. On the other hand, it lacks a lot of the QoS capability of DDS, which the RTIandCo submission supports. But the RTIandCo submission is more like rosserial, rather than the fully decentratlised communications middlewhere that the PrismTech submission is. This doesn’t mean that an rmw could be built on top, but it would not be as straight forward to use, requiring additional functionality in roslaunch.

Based on the presentation, I got the impression that the PrismTech submission was very complex with many branching paths in processing a message, and the RTIandCo submission is relatively simple. Reading the specifications made clear that the RTIandCo submission is simple: it’s a simple protocol for a single task (proxying data between a DDS domain and a device). It would be easy to implement, but has drawbacks like needing a server for it to work at all. On the other hand, reading the PrismTech submission made clear that their protocol is not that complex. It’s not as simple as RTIandCo’s, but it’s straightforward, well thought out, and clearly designed for very small scale devices. Its decentralised nature would make it easier to use in a system where it is the only protocol in use, but if you want to mix RTPS and XRCE then you would need a gateway, and the gateway would necessarily be less efficient than that in the RTIandCo proposal. However, it would also be much less of a single point of failure.

A relevant question is, given that the PrismTech submission doesn’t support aspects of DDS like QoS (except for reliability), what is the benefit (aside from overhead) compared with using a subset of RTPS?

3 Likes

Thanks for the detailed comparison and the kind words :slight_smile:

If I may give some more context to a few of our choices:

  • Our proposal deliberately only specifies the encoding for the message headers, and nothing for the payload. The reason is that we want the protocol to be as widely applicable as possible, and mandating a single payload encoding would work against that. Obviously one needs to agree at some point what that encoding should be, but it could be negotiated or configured. As you noted, when interoperating with DDS, XCDR would be a sensible choice, but it is not necessarily the only sensible one: for example, OpenSplice has nicely integrated support for Google Protocol Buffers, and so deciding to send GPB encoded data could also be a good choice.
  • Regarding QoS, the intent is that the protocol is limited to those settings that matter at the protocol level, and durability and reliability are the only ones of the DDS QoS for which this is the case — e.g., history and deadline are really handled locally. All these other QoS can be specified as properties, so that the requested-offered model of DDS can be maintained in the bridging to DDS.
  • Peer-to-peer and sleep cycles pose a bit of a problem indeed, and we haven’t really paid much attention to the combination. Still, there are many examples of gossip protocols that do just that by adjusting their cycles to stay in sync, and you could build an implementation of this protocol in peer-to-peer mode that does the same thing. Whether that would be worth the bother is anyone’s guess.
  • We are really interested in doing RMW directly on top of our protocol, but we haven’t gotten around to it yet. I guess assuming an extremely restricted environment like my implementation does makes everything just a little harder …

For all those who are interested, we have continued working on the our protocol specification and the current version is now included in our repository.

1 Like

@gbiggs, thanks for taking the time of reading and analysing both submission. I wish more OMG attendees would follow your example :slight_smile:

There are a few observations I’d like to make beside to what Erik has just made.

  1. One of he main focus on wire-efficiency for use was to be able to support properly transport that have very small MTU or low byte-rate per hour. One example is BLE (Bluetooth Low Energy), which on most devices fixes the MTU to 20 bytes (and sometimes cannot be negotiated upward). The other example is LoRA in which 400 bytes can be sent at most per hour! Thus, it is not as if we are obsessed with efficiency, it is really a matter of relevance and applicability.

  2. I think that to understand our proposal you have to look at the mechanism we provide to implement the DDS mapping and DDS-like behaviour. In essence our properties mechanism allow us to map DDS QoS and for an XRCE implementation targeting the DDS interop, arguably QoS like deadline, transport priority, etc. etc. can all be implemented. It is in fact worth to remark how if you look at the DDSI-RTPS specification it does not specify how QoS are implemented – beside Realiability. Durability, Group Coherency and all the other are implemented at DCPS level. The same is true for XRCE. Do you start to see the analogy now?

  3. As you could see from http://zenoh.io our code is less then 2000 lines of code and can run OS-less. Thus if we compare protocol complexity, if as you mention RTI implementations is also around 2000 lines of code… Perhaps our protocol is not so much more complicated to implement :slight_smile: In fact, I’d argue that for those interested in implementing only the client-to-broker protocol the complexity is similar.

In any case, thanks again for your throughout analysis… And if you have any questions or curiosity on why certain things are how they are in XRCE, please don’t hesitate to ask.

@kydos

P.S. Did you realise why declaration can be are atomic? I guess with DDS you have experienced the challenge of needing a series of entities to be declared and having partial failures… Well that was one of the things we wanted to prevent.

P.P.S Notice that Resource ID can be arbitrarily small or big, and that there is a one resource ID identifies a resource constraint, meaning that multiple ID can be associated with the same resource… That has nice implications too…

@gbiggs, are you really in agreement with what was presented by the evaluation team?

Dear all,

I wanted to share with the ROS community the proposal we made yesterday afternoon to other DDS vendors, the OMG MARS Taskforce and the XRCE evaluation team on a possible way forward.

Before articulating the proposal let me give some context.

As a result of the XRCE standardisation process we need to select one of the proposals. @gbiggs provides above a good and independent analysis on the two proposals with @eboasson and @kydos clarifying a few points. Thus if you have not read those I suggest you do before continuing reading things.

With reference to @gbiggs analysis, we have one proposal (ours) that is perceived as being slightly more complex but that supports peer-to-peer as well as client-to-broker and is more suited for constrained environments. The other (RTIandCo) which appears to be simpler but only supports client-to-broker and carries more overhead.

If along with this information, we take into account that we (ADLINK):

  1. have made available our XRCE implementation as Open Source under Apache 2 as part of the project zenoh.io, and

  2. are going to release a C++ broker by the end of the year (we already have a Swift and a Scala broker implemented – some folks have seen these in actions at various demonstrations), and

  3. are committed to make zenoh.io the XRCE reference implementation, both in terms of standards as well as quality

Now that the context is given I can enunciate the proposal I made to other vendors, task force & co:

Adopt our proposal and join forces, around the newly established open source project (zenoh), to accelerate the establishment of the XRCE standard in constrained environments.

The advantages of my proposal are several:

  1. End-users such as the ROS community would get access to an implementation of the standard much more quickly – essentially now.

  2. Other DDS vendors could have immediately constrained connectivity by simply integrating their DDS implementation on the zenoh.io broker.

  3. We would have an open source implementation of XRCE supported by all DDS vendors, which means no interoperability issues, faster evolution, and faster adoption.

  4. We would have a protocol that can do peer-to-peer as well as brokered communication, which is good for some use cases – most notably in robotics.

  5. We would have a protocol that could be deployed down to the sensors. Imagine for a moment having ROS-enabled sensors talking XRCE via low-power protocols or anything else that suits them.

  6. As the one protocol everyone uses and supports is open-source we would facilitate adoption immensely.

Collaboration can bring us much more further away than competition. What has made humans excel is our ability to collaborate not so much that to compete… Thus, why not in this case?

I am looking forward to hear comments from the ROS community. Please speak-up.

@kydos

Hi all,

eProsima has already an implementation of the XRCE DDS join submission (RTI, Twin Oaks & eProsima) released as Apache 2.0, called eProsima micro-RTPS:

Github Repo: https://github.com/eProsima/micro-RTPS
Readthedocs: http://micro-rtps.readthedocs.io
Quick start video: https://youtu.be/XT-Y1CfOGJM

Micro-RTPS is the base for the project micro-ROS (eProsima, BOSCH, Acutronic Robotics, PIAP and FIWARE Foundation), a project to extend ROS2 to microcontrollers following the ROS2 principals.

We will be presenting the project in the industrial ROS conference next Tuesday, Dec 12. See here:
http://rosindustrial.org/events/2017/12/12/ros-industrial-conference-2017

@kydos (Angelo): We have not only a complete Open Source implementation, but a joint submission with the main DDS/RTPS providers (RTI, Twin Oaks, eProsima), and an ongoing project with some of the main ROS contributors: micro-ROS. What I was planning is to get some of the good ideas you have in your submission and incorporate those to the join submission, always following the OMG process we have to create a new standard. Let’s organize the necessary meetings to get you on board.

1 Like

@Jaime_Martin_Losa, you are just following our foot-steps. Just check dates on repositories, check numbers of supporters, quantity of contribution, etc. etc… Then the real questions is why should we select the protocol that does less and takes more resources… I don’t find it a good technical argument.

You may think it is a question of ego, but I’d argue you should ask yourself the same. Our proposal is more general, more wire efficient and memory efficient. Thus technically, a rational thinker would join ours.

But again, ego and politics spoil rational thinking. But it is not too late for you to take the right choice :wink:

Have a good weekend.

@kydos

P.S. BTW, with your microRTPS you are bringing ROS back to the single point of failure/bottleneck that existed in ROS1… Wonder how you feel about it.

Hi Angelo,

We have been working in this for more than one year now. We have shown prototypes even before you published your alternative. Not only at the OMG, but within the ROS ecosystem, with already some success cases, and now we have the first alpha of a complete product: Code, Examples, Comprehensive doc, videos, etc.

Three different DDS providers are working in our direction, and you have already several assessments here and at the OMG indicating you the pitfalls of your submission, so please consider the possibility you could be wrong, or partially wrong.

Now, the process for me is clear: The OMG Evaluation team has asked for more information regarding our submissions. Please adhere to the process.

@Jaime_Martin_Losa you may have been working on this for a year. But you are very well aware that we have demonstrated prototypes ages ago – as an example look for the Huawei Eurpope Connect… In any case, for those interested in the actual history the Internet is fairly good at keeping track of it.

I’d be happy to hear from you what are the pitfalls of our submission. Thus far, all points raised, including those from the evaluation team were coming from either not reading all the document or assuming a restrictive interpretation.

But if you have a real comments, you are welcome. I’d be happy to have a technical discussion.

I’ll state it again and wait for you to prove differently, but with objective and provable facts, our submission does more than yours and is more wire efficient!

Please if you feel to reply to this email do it only with technical matters.

@kydos

@kydos it seems to me that you are the one bringing up non technical matters here regarding who did what first. It is also you who is making unsubstantiated statements about the relative capabilities and performance.

You may not like the points raised by the evaluation team but claiming that they are “not reading all the document” or their interpretation is “restrictive” is hardly an objective statement. Moreover it is disrespectful of the effort the independent evaluation team has put into the review and feedback.

I do agree it does not make sense to have this kind of discussion here it is not a technical discussion as you stated. The right forum for the technical discussion is the OMG evaluation team and task force.

Please stop trying to externalize and politicize the process.

My dearest @GerardoPardo, Just one comment. Please, technical arguments no more politics (I should probably do a T-Shirt :wink:

Technical discussions are always good in any forum.

@kydos

Hello everyone,

Please, refer to the content below for a peek into a preliminary architecture of the micro-ROS European project that @Jaime_Martin_Losa brought up above (completely inspired in the work the OSRF is doing with ROS 2):

+-------------------------------------------------------------+
|               embedded application layer                    |
+-------------------------------------------------------------+
|             micro-ROS client library (urcl)                 |
                e.g.: tf, lifecycle, executors, etc.          |
+-------------------------------------------------------------+
|             micro-ROS middleware interface (urmw)           |
+-------------------+---------------------+-------------------+
|    middleware 1   |     middleware 2    |   middleware 3    |
| (e.g. micro-RTPS) |     (e.g. mqtt)     |                   |
+-------------------+---------------------+-------------------+
|      Real-Time Operating System (RTOS) abstractions         |
+------------------+------------------+-----------------------+
|         RTOS 1   |       RTOS 2     |        RTOS 3         |
|     (e.g. NuttX) |    (e.g. RIOT)   |    (e.g. Zephyr)      |
+------------------+------------------+-----------------------+
|                         hardware                            |
+-------------------------------------------------------------+

The project started only recently. For those developing XRCE solutions, I believe it’s a good moment to start keeping an eye on it (@GerardoPardo, @kydos).

An entry point for the project is available at https://github.com/microROS/micro-ROS.

1 Like

I’m curious about the need for urcl. What is insufficient about rcl (both present and planned)?

Micro-ROS runs on NuttX, not Linux. There might be things we need to change, and there might be things we need to remove because of resource constraints. It’s not completely clear, yet. Therefore, we are not expecting to use RCL as-as, that’s all.

Hello @gbiggs,

As @Ingo_Lutkebohle pointed out, due to a limited amount of resources in MCUs, we expect to have a “micro ros client library” (urcl) with a reduced set of functionalities. Ideally, we should converge into rcl but I believe that’s beyond the scope of our project.

Cheers,

I hope that consideration is given to that so that it does not become impossible in the future. It would be much better if we have a single C client library that works on all platforms, with features turned on and off as needed for resources and use cases. Otherwise there is a big risk of divergence.

1 Like

@gbiggs,

So after a second thought and a few conversations with @astralien3000 and his colleagues, I’m happy to share we’re reconsidering our position.

There’s an ongoing prototype at GitHub - erlerobot/riot-ros2 at nuttx where we’re trying to converge through the generation of smarter cross-toolchains (just came up with that name). Let’s see where we get.

Cheers,

1 Like

I hope that consideration is given to that so that it does not become impossible in the future. It would be much better if we have a single C client library that works on all platforms, with features turned on and off as needed for resources and use cases. Otherwise there is a big risk of divergence.

Thats the goal, really, and anything else is totally not decided on, yet. “urcl” was just a placeholder name. While we initially thought that the architecture layering sketch was good to give a rough idea, I now see it may be interpreted to mean more than it does. Therefore, I updated the repository description at GitHub - micro-ROS/micro-ROS.github.io: A platform for seamless integration of resource constrained devices in the ROS ecosystem. to be a bit more generic :wink:

Remember, this is all open source, and takes inspiration from a lot of work, both the original ROS2 embedded attempts, rosserial, as well as, most recently, the RIOT work by @astralien3000 and colleagues.

Glad to hear that’s the direction you want to go in. Resource usage reduction in rcl would benefit all users of ROS, since all the other client libraries build on top of it.