Image and PointCloud2 performance issue on ROS2 python

When I try to use big data message by python code such as Camera Image or PointCloud2 on ROS2, I found the performance is terrible.

Problem:

  • Issue1: Execute “ros2 topic echo /topic_pointcloud”, no message output -> [Closed]
    Description: Launch ros2_intel_realsense by realsense_ros2_camera, and subscribe “/camera/depth/color/points” by “ros2 topic echo /camera/depth/color/points”. It will cost dozens of seconds to print out the PointCloud2 (640x480) data. On ros 1, rostopic shows data immediately.

  • Issue2: Python sub-pub big data performance is worse that CPP - [Closed]
    Description: Subscribe “/camera/depth/color/points(sensor_msgs::msgs::PointCloud2)” by both cpp subscribe api and python subscribe code, the fps is different. cpp-code is 4hz, but python-code only 0.4hz.

Debugging point to the time exhaust in the function convert_to_py while convert msg to python, in _rclpy.c:2261

Is there any idea to fix Camera message computation performance issue?

UPDATE:
Issue1: Fixed by PR
Issue2: Workaround by “export PYTHONOPTIMIZE=0”

1 Like

It’s usually problem for ROS1. In ROS1 there is ECL package helping to address it: http://wiki.ros.org/ecl_ipc/Tutorials/Shared%20Memory. In essence problem is solved by using shared memory to pass big data between processes instead of standard ROS pub/sub pipe.

I thought for ROS2 it should not be a problem, because ROS2 was created specifically to solve this particular and other ROS1 problems. If issue remains in ROS2, I’d try to use same approach as ROS1 and pass Images and PClouds through shared memory.

Update debugging status:
I have wrote test code rttest_sample to publish full-length PointCloud2 on topic /rttest_sample. the result is:
Set data size as 64x48, “ros2 topic echo /rttest_sample” works.
Set data size as 640x48, “ros2 topic echo /rttest_sample” cost ~10s.
Set data size as 640x480, “ros2 topic echo /rttest_sample” cost more than 60s.

While I use ros1_bridge and echo the same(ros2) topic in ros1, it works well with 640x480.
debugging point to code:

rcl/src/rcl/wait.c::rcl_wait

[INFO] [rcl]: Initializing wait set with ‘0’ subscriptions, ‘2’ guard conditions, ‘0’ timers, ‘0’ clients, ‘0’ services
timeout = -1
[INFO] [rcl]: Waiting without timeout
[INFO] [rcl]: Timeout calculated based on next scheduled timer: false
// Wait. here
rmw_ret_t ret = rmw_wait(
&wait_set->impl->rmw_subscriptions,
&wait_set->impl->rmw_guard_conditions,
&wait_set->impl->rmw_services,
&wait_set->impl->rmw_clients,
wait_set->impl->rmw_wait_set,
timeout_argument);

Don’t know why rmw_wait cost so much time.

1 Like

There seems to be at least one other user for whom PointCloud2 transmission is much worse on ROS2 compared to ROS1:

https://answers.ros.org/question/298352/new-message-format-for-compressed-pointcloud2

I guess this will be fixed at some point?

@yechun We performance_tested all sort of small and large data here and got latencies ~20ms for 2MB PointCloud2 messages: https://github.com/ros2/rmw_fastrtps/pull/203#issuecomment-399778193.

What I think is happening is that you are trying to publish PointCloud2 data with the default QoS settings (https://github.com/ros2/rmw/blob/master/rmw/include/rmw/qos_profiles.h#L43-L50) which are set to reliable and volatile.

ros2 topic echo command is using sensor QoS (https://github.com/ros2/ros2cli/blob/master/ros2topic/ros2topic/verb/echo.py#L134) which are best effort.

I would suggest to use sensor QoS in your publisher (see example https://github.com/ros2/ros2/wiki/About-Quality-of-Service-Settings).

Otherwise it does not make much sense to print PointCloud2 messages to a console.

For performance testing I would otherwise suggest you to use this tool: https://github.com/ApexAI/performance_test.

D.

@Martin_Guenther @Dejan_Pangercic appreciate for sharing your experiences, that would be very useful for us.

It looks like there were several different issues discussed here.

The ros2 topic issue seems to be a bug in the truncation of the output and printing to console is taking a very long time. This should be addressed by https://github.com/ros2/ros2cli/pull/126

The issue difference of FPS between C++ and Python that can be addressed by running the python interpreter in optimized mode: setting the environment variable PYTHONOPTIMIZE=0 or passing -O to the python invocation. (related ROS answers post here)

HTH,

@marguedas
Thanks for your comments, I have just updated the issue status, here is only one issue left that why python is much slower than CPP to subscribe PointCloud2 msg, I will try PYTHONOPTIMIZE to verify.

@marguedas Just verified, after set PYTHONOPTIMIZE

export PYTHONOPTIMIZE=0

The FPS of python code increased, the result is very close with C++.