How do you monitor your robot diagnostics (topic rates)?

I also see great value in monitoring Diagnostics in runtime, all the time. For example, during the moments when the robot did not function as expected, being able to check the diagnostics data has helped a lot in troubleshooting the issue faster.

This however does not give you any guarantee, that its actually working correctly.

Right, this is what I was getting at.

Sure, but are those diagnostics part of your actual operation? Or do you only access them if something goes wrong? I see this as an important distinction because a diagnostic tool may have much more data than is needed or necessary in order for the robot to actually perform the task. Kind of akin to your car diagnostics. Do you need to view these in realtime in order to drive the car or do you just need a smal handful of non-diagnostic related messages?

In what way do you mean being part of the actual operation?

On a general level, I’ve been using the Diagnostics framework to collect diagnostics data from different sources and have them published to the /diagnostics topic all the time. This information can be part of the operation by seizing the robot if something goes wrong, as @JM_ROS mentioned, or by showing part of the information for the user to display that everything is ok. If I understood your question correctly, I think this is the closest to diagnostics being used as part of the actual operation. Otherwise, I would consider it only as a tool for troubleshooting, as I don’t have any other examples where I’d use it as part of the application logic.

I will let @JM_ROS speak for themself but I think the comment was more geared towards monitor the rate of a topic, not necessarily diagnostics. I guess what I’m getting at here is that in a production environment the diagnostics shouldn’t be part of the nominal operation.

1 Like

This seems to drift of a bit. But just to answer your questions:
Diagnostic and error monitoring are related, at least in my opinion.
We use a variant of the lifecycle nodes that have dedicated error states.
Each node monitors the rate at which data is received, and if this is wrong, it goes into a dedicated error state. Nodes may also set other arbitrary errors for other reasons.

On top of this is a system state monitoring system. The second any node switches its state to one that is not expected, the whole system stops all motors and goes into fault. From the individual node states, human readable error messages are derived and shown on the HMI.
Therefore I would say, we got a hybrid of diagnostics and error detection.

2 Likes