Design By Contract

We already discussed the idea of having a “chaos node” sending random message ( with a content following the message format ) to existing topics and services, which helps discovering “unsafe nodes” during development.

But here I notice your focus on post-mortem analysis on “systems in production”. Maybe we could parametrize the “chaos node” to be able to run it in production, just like server farms run some potentially destructive tests in production, but when there is a low demand ?

Post mortem also mean, the system is running in production, but crashes and we want to know what happens. I am not aware of any tools for this purpose yet… Each node can use the tools of the programming language it was written in, but as for the communication between nodes, maybe we should have a library bagging each message, and keeping it for a certain amount of time ? some instrumentation that node writer could add to their code, so that when one node crash, we can get the log of all messages received (and sent) by that node ?

Is anyone aware of such a lib/module, or maybe it s just an extra feature we can add to existing core ROS libs ?