At ADI we are reaching a number of ROS 2 enabled devices (>10) where it is not sustainable anymore to manually update the systems we have in the field. This manual process involves SSHing into the device and pulling from the online repository and building the updated system. A process i think many of us can relate to being error prone.
In response to this issue, I have been researching what different techniques there are to update remote systems automatically.
With this post I hope to make a useful resource for others researching this topic in the future and hopefully also gain some insight into what other companies are using and their experiences.
Through my online research the following (somewhat generic) requirements for automatic deployment for remote edge systems has been made.
- Atomic updates
- update succeeded or update failed. Nothing in-between that could result in undefined behavior.
- Update must be schedulable
- Allows for a gradual rollout.
- Easily able to revert back to previous working version
- Allows you to deﬁne health checks and what failure means.
- Notification to indicate that a rollout failed, and potentially automatically rollback to the working version
- pauses further phases of the rollout
- Needs to be able to flash new firmware to host connected microcontrollers
- Some form of configuration per device
- the ability to debug the application on device if needed
- easy access to the ﬁle system being used
- little performance overhead
- is secure (does not install or execute software created by an attacker).
- Persistent data storage
- Needs to be able to handle flaky network connectivity when updating
- Should be able to render a UI to an attached screen
Through the online research I have come to the conclusion that there are two viable options that can comply with these requirements:
Dual Copy involves splitting a hard drive into two partitions, one active and the other inactive. The active partition runs an image containing everything your application needs. When an update is initiated the active partition downloads and writes the new version into the inactive partition. Once the new version is setup on the inactive partition, the bootloader is pointed to it.
After the next successful reboot, the inactive partition will run as the active partition and vice versa. If the boot process fails, the bootloader can be configured to rollback to the known previous working version in the initial partition.
- If the boot process fails, the bootloader can be configured to rollback to a known previous working version in the A partition
- Is able to update every layer of an operating system
- Can run a containerized system on top, such as Docker
- do not need to prompt the user for anything as this can be done in the background and on the next boot you will boot a updated OS.
- Needing to over-provision storage by twice the amount
- OS upgrades can consume a lot of bandwidth
- No examples online of ROS2 system using this approach
- Time needed to reboot the system after the update
The best contender for dual copy approach from my research is Mender.
A container is a lightweight, portable, and executable package. It includes everything ,besides the base OS provided by the host machine, that an application needs to run, including the application code, libraries, dependencies, and run time. Containers include only the minimum required components and share the host OS’s kernel. Because of this they are much lighter and more efficient than Virtual Machines (VM).
- containers include only the minimum required components and share the host OS’s kernel, they are much lighter and more efficient than VMs.
- OSRF has ROS2 docker images online
- Large players advocate for it: AWS IoT Greengrass have an example that use docker in a ROS remote device
- Great for development, since exact copy is able to run on remote device as development device
- Since large usage in web development many different tools are available.
- Deployment guidelines in ROS 2 documentation indicate “typical deployment scenarios often involve shipping containerized applications, or packages, into remote systems.” Indicating that it is not uncommon.
- Containers cannot update kernel
- Out of the box cannot use privileged resources and will require privileged flag removes the protective sandbox for the container running on your robot.
- Docker itself doesn’t have much in the way of orchestration, monitoring and deployment - but options like Balena, Portainer and Watchtower exist for this
- Docker does not have delta updates.
- Bad practice to run everything in one container, then what does the use of ROS 2 launch system become?
- there is no system init, up-start or system starting syslog, cron jobs and daemons, or even reaping orphaned zombie processes.
- Managing networking and communication between containers and between containers and the host can be complex
- Containers are primarily designed for running command-line applications and services. Running graphical applications within containers can be challenging and may require additional setup.
- there seems to be a potentially overwhelming amount
The best contender for the containers approach is the de facto standard Docker
- OTA updates for Embedded Linux, part 1 – Fundamentals and implementation
- OTA updates for Embedded Linux, part 2 – A comparison of off-the-shelf update systems
- The ultimate guide to software updates on embedded Linux devices - Mirza Krak
- Airbotics - The landscape of software deployment in robotics
- Deploy and Manage ROS Robots with AWS IoT Greengrass 2.0 and Docker
- Docker Containers and Images for Robot Operating System (ROS)-Based Applications
- Is learning Docker useful in the robotics field?
- ROS Docker; 6 reasons why they are not a good fit | Ubuntu
- An Updated Guide to Docker and ROS 2 - Robotic Sea Bass
- Connecting Remote Robots Using ROS2, Docker & VPN | Husarnet
- Deployment Guidelines — ROS 2 Documentation
- (PDF) ROS and Docker
- Why the future of embedded software lies in containers
- Which Over the air (OTA) update solutions for your embedded system ? The Ultimate Guide | Witekio
- Updating Embedded Linux Devices: Update strategies
Through my research I am inclined to choose the Dual Copy approach, and then specifically using Mender, over the Container approach.
This is because:
- Every level of the OS can be configured and updated (eg networking, systemd services, kernel)
- interaction with system when it is remote sounds easier, it is just a linux environment as the one run on my laptop. There is no need to take any additional limitations into account that may apply for the configuration of a Docker image, eg GPU interaction or SUB pass through for cameras.
- In the future it is maybe even possible to upgrade it to an optimized Yocto image
- we can always revert pack to using a containerized approach since this can easily installed with the images moved into the partition
- It doesn’t introduce performance overhead.
- I have a feeling that Docker is much more geared towards isolated applications within a shared operating system, hence the good practice of isolating one process per container. Using this on ROS 2 nodes that are not isolated through the use of docker compose completes negates the power ofr the ros2 launch system
There are two things that are holding me back:
- I am seeing more examples online of Docker being used for ROS2 both from blogs but also companies such as Amazon and Husarnet
- Dual copy is very focused on embedded Linux. In how far is our ROS 2 application, primarily written in python, embedded ?
- What type of deployment system are you using?
- Biggest advantages and disadvantages?
- Is there an angle that i may be missing ?