Our conclusions on trying to implement micro-ros in our robot

I work at a startup (https://www.adinnovations.nl/) that is using ros2 foxy to bring imaging systems to industrial greenhouses in the Netherlands for crop analysis. Recently, we have been looking into micro-ros and seeing if it is a good fit for our operation.

I thought that maybe it would be nice to share our conclusions on the current state of micro-ros with the ROS embedded discourse page! Maybe it will give some new perspectives on where micro-ros should go.

We have a robot that moves in one dimension with an encoder to determine its position along this line. This encoder is attached to a MCU which communicates the position information back to our main device. We also have a MCU connected to a screen and buttons as user interface.

Currently, we have written custom drivers in python using pyserial. These drivers use a very simple handshake approach to initialize the mcu to start its information stream, after which the information is sent to the main computer. Once our ros system has completed its mission, it exits, during which the class destructor of the drivers sends one last message to the MCUs to get them back into their handshaking state.

Considering that micro-ros is written by much more capable people than we currently employ, we have been investigating the use of it to replace our own serial communication approach.

We have been using the micro_ros_arduino interface, since we have experience with arduino and platformio.

Here are the conclusions that we have made!

Advantages:

  • It is no longer needed to write a driver per different MCU system since the micro-ros agent is able to run one agent for multiple serial devices. This makes it easier to add more MCUs in the future!

  • Less code duplication since a driver no longer needs to reflect state machines on MCU.

  • No large changes, besides the deletion of the drivers, is needed to incorporate micro-ros into the rest of the codebase.

Limitations:

  • The configurable parameter setting at startup and logging functionalities available in ROS have not yet been fully established. Logging can be partially done by publishing to /rosout.

  • The largest problem we have encountered is that the agent is too much a black box that does not give any insight into its state. This fact unfortunately makes it very hard to guarantee a safe connection from the main device to the MCU, since the output of the agent is not available. In contrast, with the current custom drivers implementation an exception gets thrown if the connection is broken which we can catch and then bring the ros system into the corresponding shutdown state.

  • If we were to transition to micro-ros, workarounds would be possible such as checking an incoming data stream or checking if the nodes are available. Ideally we think direct feedback from the agent would be preferable.

From our perspective the workarounds needed to mitigate the disadvantages will need an additional node to monitor the state of the MCU connections. This, in a way, negates some of the advantages since it will be similar to a driver node. Either we would need to make a monitor node per MCU; a driver per different MCU functionality; or a large node that monitors all MCUs.

With these conclusions, we have decided to not apply micro-ros to the encoder MCU yet. We are continuing to make a proof of concept for less critical user interface since a loss of connection is not considered a fatal situation for the whole system, while with the encoder that would be the case.

There is a large chance that maybe we are missing a key feature that can mitigate the current disadvantages, so tell me if that is the case!

7 Likes

Hello @Trab40,

Based on these conclusions and also from the isue you opened we have thought that micro-ROS needs a mechanism for checking the status of the middleware link between the micro-ROS agent and the client.

This is quite difficult when talking about DDS-XRCE because as stated in the OMG Document, the wire protocol is designed to be used in extreme resource-constrained devices with deep sleep modes. The main consequence of this is that all the communication processes between the client and the agent are started by the client. The agent cannot begin any communication process because the client may be in a sleep state where it cannot receive and/or respond.

But when we talk about micro-ROS, it is true that there are few use cases where the above situation happens. Most use cases are based on micro-ROS nodes that are continuously sending or receiving data or commands via ROS 2 interfaces.

This way, we have started to design an approach where the micro-ROS client is able to tell the agent that its liveliness must be ensured. This way, the micro-ROS agent will take care of ensuring that the micro-ROS client is alive and will destroy the micro-ROS session between them if a certain timeout happens.

We have named it HARD_LIVELINESS_CHECK. It is enabled by default in the micro-ROS Agent and must be enabled by means of a CMake argument in the micro-ROS Client. Also, the liveliness period can be configured in milliseconds.

Once it is enabled, the micro-ROS client and agent will be automatically checking their status. When the transport link is broken or any critical situation in the micro-ROS client-side stops the operation, the micro-ROS agent will wait during the configured milliseconds before removing all the sessions of that certain client.

Removing the session just means that the nodes, datawriters and datareaders created by the micro-ROS client will be removed from the ROS 2 graph. This way, by means of checking the ROS 2 graph you can ensure that your micro-ROS is alive and running.

Here you have the PR were we are working on that:

We will merge this on the micro-ROS mainlines when we finish the testing, but any feedback will be really welcomed.

Thanks!!

7 Likes

I think this is a quite useful feature. Thanks!

1 Like

Hello @pablogs,

I am a colleague of @Trab40 also working on the micro-ros implementation. First of all, thanks a lot for taking our feedback into consideration. We think that the solution that you came up with is a good way to have more insights into the communication by implementing that timeout with the HARD_LIVELINESS_CHECK and by being able to check that the micro-ros agent is still running. We are excited to test this once it is ready!
As you said that the check must be enabled by means of a CMake argument in the micro-ROS Client we were wondering how exactly this will be possible using the microros arduino library?
Furthermore, we think that it would also add value to the micro-ROS agent if it was possible to log the information that the agent is currently only printing to the terminal using the ros2 logger. Let us know what you think of this suggestion!
Thanks!

3 Likes

Hello @Johannap1,

In order to use HARD_LIVELINESS_CHECK in micro-ROS for Arduino just add -DUCLIENT_HARD_LIVELINESS_CHECK=ON and -DUCLIENT_HARD_LIVELINESS_CHECK_TIMEOUT=5000 in the colcon.meta of your platform and the just rebuild the micro-ROS for Arduino library.

You will need to wait until this is merged in order to have it available the use it in the Arduino library.

Regarding the ROS 2 logger output of the Agent. Which kind of information will you find useful to have? Maybe we can explore a new module in micro-ROS Agent that publish to /rosout but the current logger prints huge amounts of data when -v6 is enabled.

so we agree that -v6 is too much, but -v4 seems to show sufficient information. In addition, the verbose level can be changed from ros2 run and ros2 launch by the user to suite their needs. As in the launch file: micro-ROS-Agent/micro_ros_agent_launch.py at galactic · micro-ROS/micro-ROS-Agent · GitHub

With regard to publishing to /rosout, that would be great! The one drawback is that if you do not use rclpy/rclcpp directly, the message will not be saved in the ros log files using the same format of other ros logs, which is something that we use often when debugging problems. But I guess this depends on individual workflows when it comes to debugging.