@gavanderhoorn I’m very glad you asked! Making a status post has been on my list of things to do and has been consistently pre-empted so I appreciate that you prompted one.
In one sentence: It works, there are some issues inherited from the Trusty buildfarm as well as some new ones and more changes are coming.
Significant open issues
Reprepro performance has decreased significantly
This is causing a significant bottleneck for low level packages in the very large rosdistros, kinetic and indigo. Unless a farm is rebuilding the entire rosdistro this may not be too much of a problem. We’re managing it on the canonical buildfarm until we have time to pick up the investigation by throttling the number of concurrent jobs when low-level packages are rebuilding.
Systemd not restarting jenkins java process
The early days of the new buildfarm gave the Jenkins java process too much of the total system memory and during high load periods spikes would trigger the oom-killer. Worse still, the buildfarm would not come back on its own. I’m pretty sure the issue here is that the init script created by the jenkins puppet module is not properly configured to be managed via xenial’s systemd/init compatibility and I plan to resolve this by writing an explicit systemd service for Jenkins rather than using the one built into the puppet module.
I would like to work with the community to settle on a branching/versioning model that will satisfy our need to keep build.ros.org operating smoothly and with live changes conducted properly through configuration management and the community need to have configuration management that doesn’t change drastically week over week. I’m very open to suggestions from the community here. I’d be fine adopting semantic versioning outright, adopting basic versioning, or maintaining, with the help of a community team, stable and latest branches of the buildfarm deployment repositories.
The buildfarm_deployment repository has seen a lot activity recently, primarily because improving and maintaining it is one of my core responsibilities. In order to facilitate the move to xenial I paid down quite a bit of technical debt in the form of duplication. I also made some refinements which had implications beyond my understanding at the time and which required further changes down the line. With outstanding issues on the Trusty buildfarm becoming increasingly pronounced, I also dropped some features from the initial “release” of the xenial branch in order to perform the migration.
The largest of the postponed features is currently in progress as ros-infrastructure/buildfarm_deployment#167 and will enable deploying a ROS buildfarm on a single host, rather than the three needed to run the complete buildfarm today.
The deployment scripts had overlapping configuration values for the different roles and in order to realize a single-host buildfarm the configuration “API”/structure will need to change as well. So the current configuration values will require later changes to keep up with master when that pull request merges. I’m happy to open a discussion on discourse, or in a GitHub issue, to go into further detail on the branching model discussion. Where do folks prefer?
The xenial branches are remnants and will be removed at a future date. The buildfarm_deployment xenial branch was merged by https://github.com/ros-infrastructure/buildfarm_deployment/pull/158 and has not been deleted yet to accommodate hosts that autoreconfigure based on the xenial branch. I think my ROS 2 farm is the primary culprit here and per advisory comments I was waiting to delete branches to give folks testing them time to move off and onto master which is the branch that currently sees all new development.
I’ve been trying to get ros_buildfarm instance up and running using the latest updates and, I’ve faced two major issues; I still haven’t been able to solve one of them.
The first one might sound really silly. I used to get jenkins installed from its repository, that at the time it had a newer version of it (2.73.3, though 2.89.1 is now the most recent version). However, yhe default configuration of buildfarm_deployment_config uses the 2.60.3 version of jenkins.
I filled the configuration files with the data from the servers I was going to use and I proceeded to install and reconfigure the ros_buildfarm. Once it was over, apparently with no errors, I wasn’t able to get into jenkins by using its web interface. I then checked apache and the opened ports: Apache was installed, and port 80 was open, though only to IPv6.
Finally, after finding the pupper log file by chance (I’ve never used before puppet and I didn’t know it had a log file), it had an error on a non-existant jenkins version. Once the configuration file was changed to 2.73.3, the whole process worked properly. It would had been nice to get an error message regarding an unavailable jenkins version or maybe some hints about the existing log file or the possible errors.
The second problem is related to job generation. After cloning the ros_buildfarm and ros_buildfarm_config, and filling in the data, I’ve not been able to generate jobs due to a python syntax error (although this might not be the place, just in case, i’ve uploaded python’s output. This has me currently blocked, and I don’t know how to solve it, so some hints or documentation would also be nice.
@inigomartinez glad you’re using the ROS buildfarm scripts and thanks for sharing your experience and issues doing so. If you could please open an issue on https://github.com/ros-infrastructure/buildfarm_deployment for the jenkins LTS version that would be awesome. Feel free to open issues on that repository if anything else doesn’t work as documented during the deployment process.
Additionally, per our support guidelines please ask questions on ROS Answers. Adding the buildfarm tag to your question is usually enough to notify folks who can help.