I work at a place that has a bunch of different robotics projects with a variety of purposes developing in a quite agile strategy. Recently we’ve realized that we are having a lot of issues on robot testing (which are usually not connected to internet but only a local robot network), as multiple people are working on different problems on the same robot (occasionally using the same code base). A vast majority of problems that we had to solve was someone changing a number of settings to test their own code and when the next person tries to run the robot, there is a problem that arises from these settings causing the robot no to run as expected. That’s why we are going for a complete change in our QA and testing system. I am already aware of current QA solutions related with software, though I am particularly interested in testing and QA processes on ROS, real robots and simulation.
I’ve already gone through ROS, ROS 2 and ROS-Industrial QA documents but I would appreciate more input on how you lead your QA and testing efforts. I am dropping some specific questions below but I will be more than happy to hear general advices as well.
How does your git branching look like? Do you have different branches or different repos for simulation and real robots? How does your git flow look like? We do write ros packages in different repositories and add them as submodules to the robot project in a different repository, where and how do you keep your robot code?
How do you decide if a particular code or ros package is stable and ready to be released, and what kind of testing you do on robots to determine stability?
How much simulation testing do you perform before robot testing?
Is it a good idea to do all the testing on robot in dockers?
How do you keep your robots free from unstable code?
What kind of processes are running with CI? (unittest, auto docker builds, compiling in different environments, format checking etc.)
In which platform and how do you write the documentation and how detailed is it? (We are usually writing a page of README and a few pages of Latex report on theory per ros package)
We use GitHub flow, which is just a master branch with feature branches. For a release we tag it. If we want to maintain an old version, we sometimes create a release branch
All code for one type of robot lives in a repo. It contains all launch files for simulation and hardware. They are in different packages so that a subset of the packages can be released to the customer (you don’t want gazebo on the real robot)
We have a lot of generic ros nodes that live outside of the project repo that can be reused when needed
QA
We have a CI setup with unit & component integration tests. Most developers work on their feature in simulation first. After that a final test is performed on a robot (if needed).
We have a lot of linting (roslint, catkin_lint, etc.) so that code is already clean and reviews only have to look at code structure
When a release is close, we perform the final system test on the robot. We have a checklist for that
Deployment
Sometimes the software that is on the development robot is quite a mess. We have a policy that all changes on the robot are “volatile”, so you can just revert everything to master if needed. This is one of the things that we would like to improve upon. Maybe rsyncing your workspace to the robot?
We don’t test in docker, that would hinder debugging too much. We could release from docker, and we are working towards that
Documentation
We write README’s in markdown. Sometimes a design doc. For now this is enough. We use gitlab for linking between them. Quite simple, but I don’t know how scalable this is for the future.
Thank you very much for the detailed answer! I am happy to hear that most of what I thought about setting up our infrastructure matches with yours. I really liked the idea of “volatile” code policy on the robot, so reverting is just easy.
I was considering more docker based development but you are probably right about developing in docker will increase the time spent on debugging. Maybe it is easier using dockers for releases.
Also, do you have an “apt server” setup as well for pulling the stable branches, or just use git with tags for that purpose?
Same as the ros branching strategy, with branches like ‘kinetic-devel’ and feature branches off that on forks per developer. These are rebased on eg. kinetic-devel regularly to keep history clear.
Each developer has an account on development robots with a catkin overlay over the default stuff that is all passed Q/A and just works. When another developer logs in for their testing, the basis they work on is not borked by another developer’s stuff.
For simulation, that is just an arg to the launch files which simply launches some different launch files underneath. If you need different branches for simulation, I’d consider that bad practice.
My team really dislikes submodules so we use .rosinstall-files to tie our workspace together. These are also used by Travis and ROS Industrial CI to build the dependencies for a package.
Industrial CI also runs pylint checks for python2/3, all the tests etc etc.
In terms of testing and Q/A: we have a really good Q/A dude that is very critical. It takes time to pass that but keeps the standard high.
We’re also using GitHub - floweisshardt/atf to keep some tabs on performance in some simulated scenarios, eg. can the robot still navigate through some environment within X time.
With regard to running software on robots in a docker container. It is possible to run a docker container in host network mode. That way it uses the network interfaces of the host (rather than an isolated network) and you can obtain the “full bidirectional communication on all ports” which is necessary to get two system to talk to each other in ROS.
One challenge I am currently thinking about is how to maintain a production release while developing new features. The debugging tools might get updated and no longer be compatible with the release version. This is especially true across ROS distributions. I am currently contemplating whether it would be possible and a good idea to have all developer tools also be available in a docker environment, so that I for instance can connect rviz to a system running bionic on a laptop running ubuntu 20.04.
Thanks a lot for sharing your development process.
I’ve never thought of having different user accounts on robots. I think it is a nice and easy way to isolate developers work.
I’ve never heard of atf, but it is looking great and I will be sure to play with it a little.
Your tool is also looking cool and I think it is a smart way to generate edge cases and push the software a littler harder. I will also try it in my free time and try to give some feedback.
That sounds like an interesting approach. How do you manage those accounts at the development robots? Does everybody have sudo? Are the users created automatically?
Each developer has sudo rights, so you still have to take care to not screw things up. There is a script to set up a new developer account, not much more management going on in that regard.
We should have done the same for our AMIGO and HSR robots at TechUnited, @Rayman :-). Would have saved so much trouble all around.
As for best practices: having hot-swappable batteries like AMIGO has is definitely also a very very handy practice