Interesting. I am not sure what needs to be secured from or exposed as API to the public network. but here are my thoughts, hope this would help a bit.
Both of these can be monitored, i.e. metrics (cpu, memory, etc) is collected and then can be visualized through graphs (alerting is a plus)
I would use Prometheus Node Exporter to collect these statistics, or probably Open Telemetry. We can also use Grafana for monitor and visualizer and AlertManager to detect the alerting based on the statistics.
If you want to collect ROS 2 statistics, you can refer to Topic-Statistics-Tutorial — ROS 2 Documentation: Foxy documentation, be advised that statistics are available in C++ only. and these statistics cannot be collected on Prometheus unless we develop the ROS 2 exporter implementation for Prometheus.
As Robolaunch team, we are providing a platform that capable of exactly what you’re looking for and also some other functionalities as well.
Our main focus is to eliminate the challenges of building robotics application and operating the physical robots through out Robolaunch Cloud Platform. Our robot deployment model is based on git repos and it generates robots from it’s ROS code. In this scope, users can develop, deploy and operate their ROS & ROS2 based robots’ life-cycle & FCAPS management decleratively.
Unified robotics operations with Roboulaunch:
Design: Create 3D robots and worldviews for different use cases
Develop: robotics software development through Cloud based IDE without any manual software and library installation
Simulate&Train: GPU accelerated re-inforcement learning and hyper-realistic simulations
Deliver: Deploy robotics application to the physical& virtual robot(s) and their digital twins decleratively
Operate: Configure, manage and monitor multiple robots and fleets in run-time
About Robolaunch’s network isolation and edge computing architecture: