Reposted from Exception handling in user callbacks? by meyerj · Pull Request #2017 · ros2/rclcpp · GitHub
I tried to find some information about this topic in the documentation, in the code, on GitHub, on Discourse, on ROS Answers, but failed to find something conclusive, or maybe used the wrong search terms. Only this post and this answer seem to be related. For the special case of service callbacks I remember having seen a discussion/feature request to forward exceptions to the caller as a special response like in ROS 1, but did not find it anymore now.
-
User callbacks must never throw?
They do. I triggered the case by using the
ros1_bridge
with a service server in ROS 1 and a client calling it from ROS 2: If the ROS 1 service is not available anymore, for example because the ROS 1 node died, the callback defined inServiceFactory<ROS1_T, ROS2_T>::forward_2_to_1()
throws a runtime error after the roscpp service call API returned false. Also any ROS 2 middleware can throw exceptions, I assume, when the user callback invokes a publisher or service client itself. Apparently it is even recommended to handle errors by throwing exceptions.So if the rule would be that user callbacks must handle exceptions internally, I guess
ros1_bridge
and numerous other node implementations would need to be fixed. -
Did I miss a place where this is already handled within rclcpp?
Even rclcpp code itself may throw exceptions in the
Executor
code path while spinning, for example here.If that is not the case yet, maybe a per executor, per node or per context flag would be nice-to-have, that decides whether exceptions are unhandled like it seems to be the case now, or whether rclcpp catches and logs them internally. Or some mechanism to register a user callback that receives an
std::exception_ptr
and whose return value decides whether the executor continuous or aborts… -
Always catch exceptions when spinning?
As a last resort, I wanted to patch the main loop of the
dynamic_bridge
(and other nodes), such that exceptions get logged, but the node does not terminate and continues to forward other topics and service calls. But that is not possible without the patch proposed here:// ROS 2 spinning loop rclcpp::executors::SingleThreadedExecutor executor; while (ros1_node.ok() && rclcpp::ok()) { try { executor.spin_node_once(ros2_node); } catch (std::exception& e) { // Log the exception and continue spinning... } }
The problem is that it triggers the “Node has already been added to an executor” exception here in the next cycle after the exception, and hence keeps logging in a loop. So maybe the executor needs to be recreated to recover? Or I could call
executor.remove_node(ros2_node)
in the catch body as a workaround? That was the point where I started to investigate the problem and ended up here.The proposed patch would fix that, I think, by removing the node from the executor before the exception is rethrown to be handled in
main()
or whereever elsespin_once()
has been called from. I have not actually tested it yet by compiling rclcpp from source. I also may have missed other places whereadd_node()
andremove_node()
gets called in pairs. Maybe a better design would involve a RAII-style class that adds a node in its constructor and removes it again in its destructor? Seems likeRCPPUTILS_SCOPE_EXIT()
is meant exactly for those use cases and should be applied instead of my try/catch block, but I only discovered it while writing this.The same pattern that involves a loop with
rclcpp::ok()
andrclcpp::spin_once()
directly inmain()
can be found in many other places, too, e.g. here. I am not sure whether rclpy is also affected, but in ROS2 Python examples the equivalent pattern is even dominant.For the more simple
rclcpp::spin(node)
call an extra loop would need to be added to keep spinning after an exception.
I can almost not believe that there is no foreseen or documented way to prevent that any minor fault terminates the whole process, or that this behavior is “by design”? I am sorry in case there is something more obvious, and I just missed it.
It is easy to reproduce the crash with the minimal_service
example in ros2/examples, by adding a throw statement in the callback:
$ ros2 run examples_rclcpp_minimal_service service_main &
[1] 353822
$ ros2 service call /add_two_ints example_interfaces/srv/AddTwoInts "{}"
requester: making request: example_interfaces.srv.AddTwoInts_Request(a=0, b=0)
[INFO] [1663789664.837616992] [minimal_service]: request: 0 + 0
terminate called after throwing an instance of 'std::runtime_error'
what(): some error
^C[1]+ Exit 250 ros2 run examples_rclcpp_minimal_service service_main
$