We’re a docker/compose shop also, with the general approach being to use a container per major system. E.g. control, commanding, path planning, comms, perception, etc. Taking advantage of restart-unless-stopped
lets us make sure the robot is up and ready to use right after bootup. I particularly like using docker because each container can be different images. One drawback (for us anyways) is the images can get quite large and be cumbersome to update.
It terms of monitoring, we do monitor onboard but anything from robot to console has been largely in house built as we have significantly lower bandwidth available than most (subsea robots communicating over acoustic communications). I’d definitely be interested in hearing other folks approaches, especially in communications starved situations.