Astra driver for ROS2

I noticed there is a repo to use the astra camera in a ros2 deployment.
I forked the code and am trying to run it, and it compiles just fine, but it doesn’t actually deliver any data.
I can run OpenCV and it connects and delivers data.
I can run the ROS1 driver, and it connects and delivers data.
I can run the OpenNI sample viewer, and it connects and delivers data.

So my question is:
Am I doing something wrong in ROS2 for it to work?
Or is this driver still under development?

I’m just putting the code into a workspace, like usual, and building it with ‘ament build’.
When I run it, I see ‘Starting IR stream’ and ‘Starting depth stream’, and the topics are all there. But I don’t see any data flowing.
I used CLion to breakpoint into the code, and it didn’t appear that any callbacks were being activated.

Anyways, any help would be great.
Thanks!

I’ve used it in the past for sure, but it has been under active development lately for a demo we’re trying to put together.

@clalancette may have more info for you.

I do have an Astra Pro. Maybe there’s an issue there?

I know that the astra Pro had issue in ROS1 because it advertise itself as a UVC for the RGB and another USB device for the depth. The ROS2 version of the driver did work for me in ROS2 by just building it, installing the udev rules and running it. We never tried it on an Astra Pro (AFAIK) though. @Kukanani and @clalancette have been playing with it lately they may be able to give more input.

I did do some work on it, though I haven’t really changed much in a few weeks. In my testing, things seem to be working and I’m getting good depth images out of it.

The fact that you don’t seem to be getting callbacks is interesting, though. Note that there are two levels of callbacks: when the driver actually has a frame it calls AstraFrameListener::onNewFrame(), and if the frame is “valid” onNewFrame() then calls the higher level AstraDriver::newDepthFrameCounter(). Where did you put your breakpoint? Can you see if you are getting the onNewFrame() callback?

One other thing I’ll mention is that I have seen cases where the camera will stop streaming data after starting and stopping the driver a few times. This seems to be some kind of firmware level thing, and if you unplug the camera, wait a couple of seconds, then plug it back in, it tends to start working again. That would also be worth trying.

So, I switched out my Astra Pro for an Astra Mini, and the driver just worked.
There are some issues, so I’m going to open up an Issue on GitHub.

Thanks for the help.

Anyone looked into the performance comparison of the ROS1 driver vs ROS2 driver?
I’m noticing a HUGE degradation in performance using the ROS2 driver. Wondering if it’s my setup or if the driver just isn’t fleshed out enough to be performant.

My test system: MacBook Pro i7 2.5GHz, Xenial connecting to Astra Mini.

  • ROS1 driver delivers ~30 fps and consumes ~70% CPU.
  • ROS2 driver, with CoreDX as rmw implementation, delivers ~5 fps and consumes ~170% CPU.

Yeah, I’ve had performance issues as well. I was able to mostly work around it by disabling the color and IR cameras, and set the depth camera to 320x240. That gets me about 15 frames/sec with something like 100% CPU time, which is good enough for what I need at the moment. But we should really look into improving it.

Ouch.

I did the same thing as you, and I was getting 30 fps from the driver. (although this was all going through the ROS2 dynamic_bridge, so there may be some lag).
The depth_to_pointcloud node drops that in half to ~15 fps for the PointCloud message. That surprises me. Is that conversion computation really that time consuming? Maybe we can make it faster using GPUs?

I should be a little more clear that I am running all of this on a Pine 64 board (https://pine64.org), so it won’t be as fast as on an Intel chip.

That being said, I don’t really know how much CPU time this should take up. My feeling is that there is room for improvement both in the astra driver and in depth_to_pointcloud, but I haven’t profiled to find out where the bottlenecks are. Before I went all the way to a GPU, I would profile, find the problem points, and try to optimize those first (this could be as simple as building with optimization and/or NEON turned on, neither of which I’m doing at the moment).

I just figured that the function is pretty basic. Iterate n^2 over an image and convert to PointCloud2 format. I’m already compiling in Release, so optimization is all on. So maybe GPU is the only way to go for real acceleration.

But all that aside, I did some performance analysis using the operf tool and ran it against both the ROS1 and ROS2 versions of both astra_camera_node and depth_image_proc/point_cloud_xyz : depth_to_pointcloud.
It looks like they both spend about the same time in the same functions, but the ROS2 variants spend extra time in libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_ros_message_to_dds(sensor_msgs::msg::PointCloud2_<std::allocator<void> > const&, sensor_msgs::msg::dds_::PointCloud2_&).

Not sure if there are techniques, yet, that help to accelerate/overcome this new conversion step.
[My next step is to verify that my ROS2 base was compiled in release mode…]

Ok. Verified my ROS2 base is in Release mode (it wasn’t).
I reran the profiling on the depth_to_pointcloud node, and this is the output:

Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image_name symbol_name 24195 42.7473 wl /wl 13177 23.2809 libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_ros_message_to_dds(sensor_msgs::msg::PointCloud2_<std::allocator<void> > const&, sensor_msgs::msg::dds_::PointCloud2_&) 3405 6.0159 libsensor_msgs__rosidl_typesupport_coredx_cpp.so sensor_msgs::msg::typesupport_coredx_cpp::convert_dds_message_to_ros(sensor_msgs::msg::dds_::Image_ const&, sensor_msgs::msg::Image_<std::allocator<void> >&) 3261 5.7615 libc-2.23.so __memcpy_avx_unaligned 3056 5.3993 depth_to_pointcloud_node void depth_to_pointcloud::convert<float>(std::shared_ptr<sensor_msgs::msg::Image_<std::allocator<void> > const> const&, std::shared_ptr<sensor_msgs::msg::PointCloud2_<std::allocator<void> > >&, image_geometry::PinholeCameraModel const&, double) 2871 5.0724 libc-2.23.so __memmove_avx_unaligned 2490 4.3993 libdds_cf.so /opt/coredx-4.0.16/target/Linux_2.6_x86_64_gcc43/lib/libdds_cf.so 1168 2.0636 libc-2.23.so __memset_avx2

A little hard to read, but basically the worst offender is the convert_ros_to_dds on the PointCloud, which takes ~23%. Then comes convert_dds_to_ros on the Image, at ~6%, and then finally the actual conversion from Image to PointCloud2, at ~5%.

@ClarkTucker maybe there’s something we can fix in the CoreDX rmw wrapper that can help speed this up?

Another idea I had was I could make a composition of the astra driver and this conversion and eliminate part of the slowdown (~6% for dds_to_ros conversion of Image).

What about that ~40% going to wl? Is that the wireless driver? I guess that is the connection DDS would be using on my laptop…

Any other ideas?

Nice debugging. Out of curiousity, exactly how did you run operf (just so it is archived here for the future)?

I was wondering about that wl myself. The next time I have some time (probably next week), I might try this locally to see what happens when I’m hooked to ethernet instead of wifi. That would be interesting.

I also notice that there is ~10% going to memmove and memcpy, unaligned. I’m guessing if we could find a way to ensure that that is aligned, that would also be faster.

Other than those things, yeah, I’d say we’d have to concentrate on convert_ros_message_to_dds and see what we could improve there.

I downloaded the oprofile package and installed it.
To use it, you have to have root privelages, so I created a bash script that I can run under sudo.
#!/usr/bin/env bash export ROS_DOMAIN_ID=8 export RMW_IMPLEMENTATION=rmw_coredx_cpp source <ros2_base_install>/local_setup.bash source <depth_to_pointcloud_install>/local_setup.bash operf /absolute/path/to/depth_to_pointcloud_node

> sudo ./my_bash_script.bash does the trick.
The reason I chose oprofile instead of gprof (or other) is that you can kill the process and it still collects info.

To generate the statistics you have to run:
opreport --symbols

There could be something obvious that could significantly improve convert_ros_message_to_dds for CoreDX, but it would be interesting to see if this is also the case with Connext (I’m not sure FastRTPS would be apples to apples since it uses type support introspection to serialize/deserialize). We always knew that this function was wasted work, but to get rid of it, we need to have the ability in the middleware to publish and take serialized data directly rather than delegating serialization/deserialization to the middleware. That way we can do the “ROS message struct” -> CDR serialized buffer directly skipping the convert function. I know we could do this in FastRTPS, but I’m not sure about Connext. Also FastRTPS currently uses the introspection API which will be slower than statically generated code to do the conversions. We can likely get a huge speed up there by providing a static version of type support for FastRTPS.

Ran the code in composition mode (both depth_to_pointcloud and astra_driver as libs, using UniquePtr to pass the Image), and that didn’t seem to make much difference.

Switched back to 2-node mode, and connected via Ethernet instead of Wifi (disabled Wifi, connected a dummy cable to another machine in the lab), and the performance was way better. So clearly that ~40% on wl was getting in the way.