TZC: Efficient Inter-Process Communication for Robotics Middleware with Partial Serialization

Hello

Here is an efficient Inter Process communication, which works better than existing ROS1 and ROS2 framework.

The TZC can be easily integrated into the ROS framework.

Have a look

Here is the base paper

Video Demonstration

Github Code

Courtesy: Dinesh Manocha

Screenshot%20from%202018-10-02%2011-04-42

3 Likes

Any link to the library and code examples to get a feel of it?

The paper does not contain any link to the implementation of their transport logic they proposed.

@awesomebytes I will get back to you regarding the code. I have sent a mail to the author. Hope he will respond soon.

Regards

Skimming through it, it would seem that from a very high-level this is similar to using shared-memory for the entire message (ie: like nodelets in ROS 1) but then using pub-sub for the reference to the object in the shared-memory segment.

It’s not clear to me from the paper what the benefit is of splitting up the message in a part that uses regular serialisation (“control part”) and a part that gets pushed into shared-memory: it would seem that you get the disadvantages of using a shared message anyway (ie: nodes cannot change “their copy” without affecting the message contents for all subscribers), and messages cannot be presented to callbacks without having deserialised the control part.

The latency improvements are to be expected: (de)serialisation times in ROS 1 and 2 are typically mostly dominated by fields that are larger than the L1 / L2 cache of the CPU used (depending on memory bandwidth), so if you can skip those it will instantly improve message throughput (almost linearly).

The authors also don’t discuss compatibility with nodes not using this transport. From the description it would seem there is no fallback to full serialisation when communicating with non-extended nodes. Nodelets are certainly not perfect, but at least supported some measure of graceful degradation.


Edit: re: why split the message: it’s not made very clear, but this sentence seems to provide a possible rationale:

The ETHZ-ASL framework eliminates copying operations, but multiple serialization operations remain. This is because the whole message is too complicated to be shared within shared memory without serialization.

“too complicated” is a bit vague though.

2 Likes

I guess that “too complicated” is a simple way to say that we can’t simply share objects in shared memory that use memory allocations and non-continuous segments of memory (in practice, every message which has a string or a vector with non fixed size).

This is “serialization 101” :wink:

Sure, only PODs can be directly shared.

It would have been nice if the authors had made that more explicit though. The paper seems to step over that bit of rationale, which makes it not very obvious why they chose their approach.

Hi, @awesomebytes @sagniknitr. I am one of the authors. Thanks for your interest.
The code is not fully prepared for reading yet. I hope we can sort it out soon.

1 Like

Thanks for your comments! Here is some responses. I hope these responses will make my point clearer.

  1. We have argued in Section II.C that intra-process communication (such as nodelets) is the only efficient solution for now. But the obvious drawback of intra-process communication is fault isolation. Since all modules run within the same process, when any module crashes, the entire system crashes. There are applications that pay more attention to reliability and TZC can provide an option.

  2. Combining socket and shared-memory may inherit their disadvantages, but also their advantages. By using socket for the control part, we can use compatible select/poll notification interfaces; and by using shared-memory for the data part, we can skip serialization for most of the data. As for the disadvantages you have mentioned, I don’t consider them serious because:
    2.1. If a subscriber need to change the message, it can always copy the message and edit its copy and suffer the copying time. But there are practical callbacks that need not change the message and TZC provides an optimization.
    2.2. Shared-memory IPC does not provide proper synchronization (or notification) mechanism. We have to notify the subscribers through another channel. The control part is used for that and it is usually small enough (although it is larger than a reference) to omit its serialization latency.

  3. The latency improvements are to be expected, IF we can skip those serialization operations. How to skip serialization for inter-process communication is the main contribution of this paper.

  4. As shown in the example code, TZC generates new message types and works per topic. You can always publish ROS (1 or 2) messages without TZC.

  5. About why split the message. ROS transmits all message data through socket and ETHZ-ASL transmits all message data through shared-memory, but both of them can not avoid serialization. We split the message to avoid serialization of most of the message data (i.e. the data part).

1 Like

At glance it looks very similar to http://wiki.ros.org/shm_transport
Are there significant advantages in TZC?

1 Like

Thanks for the clarifications @Jrdevil-Wang.

Just to make sure: bw-compatibility with nodes not using the new transport is not supported, correct?

In fact, shm_transport is our previous work and TZC is based on it.

Soon after we open-sourced shm_transport, we found that it is much like ETHZ-ASL shared-memory framework in terms of performance. Both shm_transport and ETHZ-ASL avoid copying operations by serializing each message into shared-memory at the publisher side and de-serializing it at the subscriber side. Such a mechanism can reduce the latency by 2-3 times comparing with ROS. However, we were not satisfied because the serialization operations are still a performance bottleneck. Thus, TZC is inspired.

TZC not only avoid copying operations, but also avoid serialization operations for most of the data. Therefore, the latency no longer increases with the message size grows which is good for large message transmission.

Currently yes. But as we have mentioned in the future work, we are planning to provide this kind of compatibility. For ROS1, the publisher-subscriber link mechanism makes this plan feasible, but we still have no idea how to support it for ROS2.

Hi everyone @awesomebytes, @sagniknitr!
I have just shared the source code of TZC at https://github.com/Jrdevil-Wang/tzc_transport.
Please have a try and contact me if there are any concerns or suggestions.

Great job, I’ve seen your slides in the 6th ROS Summer School China, Is ros2 implementation available?

Thank you for sharing TZC as open source software. This could be another candidate for a ROS2 middleware implementation.

The big challenge here is to keep the performance of the low level ICP layer when doing the integration into large frameworks like ROS2 over different abstraction layers (IPC > DDS > RCL).

To keep the overall software architecture layered a ROS user

  • should not think about dynamic and fixed size message types

  • should not need to make a decision whether a payload needs to be transported by local IPC or over a network and that that all subscribers get a synchronized access to the same content

So finally, if it comes to an integration into ROS2 as an RMW you need some good ideas to keep the performance as much as possible without breaking the API of the different intermediate middlewares, doesn’t matter if you use TZC, IceOryx or eCAL as an IPC implementation.

2 Likes