Using zero-copy transport in ROS2 with ros2_shm_msgs

using zero-copy transport in ROS2 with ros2_shm_msgs

motivation

zero-copy transport has been available in iceoryx, cyclonedds, FastDDS and eCAL, and the loaned_api can be used to get minimum copy to get better performance suce as lower cpu usage and transport latency.

Due to the unbounded nature of the point cloud and image message types being used in ROS2, these messages will not be supported by zero-copy in Galactic. To utilize zero-copy for point cloud and image data, message types will need to be revised.

ros2_shm_msgs

ros2_shm_msgs provides ROS2 image and pointcloud2 message definitions that support zero copy transportation, demos and tools also included.

api usage

there is a code snip in shm_image1m_talker, the usage is much like cv_bridge and pcl_conversions.

class Talker : public rclcpp::Node {
private:
  using Topic = shm_msgs::msg::Image1m;

public:
  explicit Talker(const rclcpp::NodeOptions &options)
      : Node("shm_image1m_talker", options) {

    m_input_cvimage->image = cv::imread("./res/img/205_182.png");
    m_input_cvimage->header.frame_id = "camera_link";
    m_input_cvimage->encoding = "bgr8";
    // cv::imshow("input image", m_input_cvimage->image);
    // cv::waitKey(0);

    auto publishMessage = [this]() -> void {

      auto loanedMsg = m_publisher->borrow_loaned_message();

      populateLoanedMessage(loanedMsg);

      m_publisher->publish(std::move(loanedMsg));
      // We gave up ownership and loanedMsg is not supposed to be accessed
      // anymore

      m_count++;
    };

    rclcpp::QoS qos(rclcpp::KeepLast(10));
    m_publisher = this->create_publisher<Topic>("shm_image_1m", qos);

    // Use a timer to schedule periodic message publishing.
    m_timer = this->create_wall_timer(0.1s, publishMessage);
  }

private:
  uint64_t m_count = 1;
  rclcpp::Publisher<Topic>::SharedPtr m_publisher;
  rclcpp::TimerBase::SharedPtr m_timer;
  std::shared_ptr<shm_msgs::CvImage> m_input_cvimage{std::make_shared<shm_msgs::CvImage>()};

  void populateLoanedMessage(rclcpp::LoanedMessage<Topic> &loanedMsg) {
    Topic &msg = loanedMsg.get();

    m_input_cvimage->header.stamp = now();

    m_input_cvimage->toImageMsg(msg);

    RCLCPP_INFO(this->get_logger(), "Publishing ");
  }
};

rviz bridge

as discussied in publish each image to its own topic for use in rqt_image or rviz2 · Issue #11 · MatthiasKillat/ros2_shm_vision_demo · GitHub

there is need for visualization of the custom ros2_shm_msgs types, so the rviz bridge is developed for this.

see this as an example

performance in benchmarks

see zero-copy performance of rmw_cyclonedds and rmw_fastrtps in GitHub - ZhenshengLee/ros2_jetson_benchmarks: The ros2 benchmarks for jetson xavier and pc.

which is tested within GitHub - ZhenshengLee/performance_test: Github repo for apex.ai performance_test for more middlewares.

performance in ros_shm_msgs demo

there are demos in the repo to show the performance like transport time and fps.

see this as an example

cd install/v4l2_camera/lib/v4l2_camera
./v4l2_camera_image1m_subscriber

[INFO] [1654154565.559710174] [shm_image1m_subscriber]: Received...
[INFO] [1654154565.559730953] [shm_image1m_subscriber]: get-image1m-transport-time: 0.253

performance in device driver

this lib has been used in the custom version of ros2_v4l2_camera
and rslidar_sdk, and the performance improvement is awesome!

For example, in my pc of dell 3630, the zero-copy transport of a shm_msgs::msg::Image1m can save about 80% of transport time, from 1.4ms to 0.3ms

issues

Feedback from community is welcomed!

status of usage of device drivers

to be added.

future work

the type adaptation and type negotiation in Humble ROS2 shall be applied in the repo.

10 Likes

@ZhenshengLee

thanks for sharing your experience and evaluation for LoanedMessage :+1:

a few minor comments that i have,

  • message types (or definition) should be agnostic from transport layer such as IPC, UDP or TCP. LoanedMessage depends on RMW middleware to see if the memory can be borrowed from underlying middleware. that says, application does not know if shared memory is used even if the message type is bound, could be heap memory, could be shared memory, could be special hardware buffer or anything. So I would name those message types with bound message types?
  • i think that it is about trading-off between memory and performance. using bound message types could bring us better performance with LoanedMessage but memory could be wasted at this moment. it would be better that RMW implementation can take care of unbound message (dynamic types) with LoanedMessage in the future.
  • Ultimately I would use and control page sizes for shared memory once it comes to performance, so that we can save the system trap and cache/TLB miss. This is also about trading-off between memory and performance since large pages require contiguous memory allocation. I am not even sure this is required for ROS aspect, probably implementation configuration or setting.

thanks,

2 Likes

@tomoyafujita Hi, thanks for your comment!

thanks for sharing your experience and evaluation for LoanedMessage

In ROS2 Design, LoanedMessage API is the way for zero-copy-inter-process-comm

LoanedMessage depends on RMW middleware to see if the memory can be borrowed from underlying middleware.

As far as I know, the zero-copy mechanism is based on shm in iceoryx, FastDDS, and eCAL

message types (or definition) should be agnostic from transport layer such as IPC, UDP or TCP.

Yes, you are right about the relationship between data exchange format and transport layer.
I will consider to refactor the whole repo after it gets more attention from community.

i think that it is about trading-off between memory and performance

Yes you are right, same opinion from How zero copy handles variable length data under same topic · eProsima/Fast-DDS · Discussion #2773 · GitHub

but memory could be wasted at this moment.

Yes, but usually in robotics algorithm system, the latency is more expensive while the memory is cheaper. If you agree?

it would be better that RMW implementation can take care of unbound message (dynamic types) with LoanedMessage in the future.

Some idea from eCAL and iceoryx may help, and I did’t read the docs of RTI Connext.

Also, the mechanism of zero copy of eCAL in eCAL has been moved — eCAL Documentation documentation does not require the bounded-size data types. Which may help.

this issue eclipse-iceoryx/iceoryx#911 describe an idea of dynamic size type support in iceoryx, which may help, but this has not been implemented yet.

Ultimately I would use and control page sizes for shared memory once it comes to performance, so that we can save the system trap and cache/TLB miss.

This kind of practice is about more concept of realtime system but less with performance optimization, which you can see in the pendulum demo in https://github.com/ros2-realtime-demo/pendulum/blob/rolling/pendulum_utils/include/pendulum_utils/memory_lock.hpp

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.