I just figured that the function is pretty basic. Iterate n^2 over an image and convert to PointCloud2 format. I’m already compiling in Release, so optimization is all on. So maybe GPU is the only way to go for real acceleration.
But all that aside, I did some performance analysis using the operf
tool and ran it against both the ROS1 and ROS2 versions of both astra_camera_node
and depth_image_proc/point_cloud_xyz : depth_to_pointcloud
.
It looks like they both spend about the same time in the same functions, but the ROS2 variants spend extra time in libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_ros_message_to_dds(sensor_msgs::msg::PointCloud2_<std::allocator<void> > const&, sensor_msgs::msg::dds_::PointCloud2_&)
.
Not sure if there are techniques, yet, that help to accelerate/overcome this new conversion step.
[My next step is to verify that my ROS2 base was compiled in release mode…]