Ok. Verified my ROS2 base is in Release mode (it wasn’t).
I reran the profiling on the depth_to_pointcloud node, and this is the output:
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image_name symbol_name 24195 42.7473 wl /wl 13177 23.2809 libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_ros_message_to_dds(sensor_msgs::msg::PointCloud2_<std::allocator<void> > const&, sensor_msgs::msg::dds_::PointCloud2_&) 3405 6.0159 libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_dds_message_to_ros(sensor_msgs::msg::dds_::Image_ const&, sensor_msgs::msg::Image_<std::allocator<void> >&) 3261 5.7615 libc-2.23.so __memcpy_avx_unaligned 3056 5.3993 depth_to_pointcloud_node void depth_to_pointcloud::convert<float>(std::shared_ptr<sensor_msgs::msg::Image_<std::allocator<void> > const> const&, std::shared_ptr<sensor_msgs::msg::PointCloud2_<std::allocator<void> > >&, image_geometry::PinholeCameraModel const&, double) 2871 5.0724 libc-2.23.so __memmove_avx_unaligned 2490 4.3993 libdds_cf.so /opt/coredx-4.0.16/target/Linux_2.6_x86_64_gcc43/lib/libdds_cf.so 1168 2.0636 libc-2.23.so __memset_avx2
A little hard to read, but basically the worst offender is the convert_ros_to_dds on the PointCloud, which takes ~23%. Then comes convert_dds_to_ros on the Image, at ~6%, and then finally the actual conversion from Image to PointCloud2, at ~5%.
@ClarkTucker maybe there’s something we can fix in the CoreDX rmw wrapper that can help speed this up?
Another idea I had was I could make a composition of the astra driver and this conversion and eliminate part of the slowdown (~6% for dds_to_ros conversion of Image).
What about that ~40% going to wl? Is that the wireless driver? I guess that is the connection DDS would be using on my laptop…
Any other ideas?