Status of Xenial migration

@nuclearsandwich: can you comment on the current status of the recently merged Xenial upgrades to ros_buildfarm and related repositories?

I’m starting to deploy a new buildfarm instance and would like to do a Xenial based one, but seeing the recent PRs against ros_buildfarm and some of the tickets make me hesitant.

There’s also still a xenial branch. Is that a remnant, or is that also going to be merged at some point?

@gavanderhoorn I’m very glad you asked! Making a status post has been on my list of things to do and has been consistently pre-empted so I appreciate that you prompted one.

In one sentence: It works, there are some issues inherited from the Trusty buildfarm as well as some new ones and more changes are coming.

Significant open issues

  • Reprepro performance has decreased significantly
    This is causing a significant bottleneck for low level packages in the very large rosdistros, kinetic and indigo. Unless a farm is rebuilding the entire rosdistro this may not be too much of a problem. We’re managing it on the canonical buildfarm until we have time to pick up the investigation by throttling the number of concurrent jobs when low-level packages are rebuilding.

  • Systemd not restarting jenkins java process
    The early days of the new buildfarm gave the Jenkins java process too much of the total system memory and during high load periods spikes would trigger the oom-killer. Worse still, the buildfarm would not come back on its own. I’m pretty sure the issue here is that the init script created by the jenkins puppet module is not properly configured to be managed via xenial’s systemd/init compatibility and I plan to resolve this by writing an explicit systemd service for Jenkins rather than using the one built into the puppet module.

  • Mercurial development jobs getting triggered too often
    This is a seeming regression that has yet to be investigated and primarily effects buildfarm instances with elastically provisioned workers.

Stability and maturity

I would like to work with the community to settle on a branching/versioning model that will satisfy our need to keep build.ros.org operating smoothly and with live changes conducted properly through configuration management and the community need to have configuration management that doesn’t change drastically week over week. I’m very open to suggestions from the community here. I’d be fine adopting semantic versioning outright, adopting basic versioning, or maintaining, with the help of a community team, stable and latest branches of the buildfarm deployment repositories.

The buildfarm_deployment repository has seen a lot activity recently, primarily because improving and maintaining it is one of my core responsibilities. In order to facilitate the move to xenial I paid down quite a bit of technical debt in the form of duplication. I also made some refinements which had implications beyond my understanding at the time and which required further changes down the line. With outstanding issues on the Trusty buildfarm becoming increasingly pronounced, I also dropped some features from the initial “release” of the xenial branch in order to perform the migration.

The largest of the postponed features is currently in progress as ros-infrastructure/buildfarm_deployment#167 and will enable deploying a ROS buildfarm on a single host, rather than the three needed to run the complete buildfarm today.

The deployment scripts had overlapping configuration values for the different roles and in order to realize a single-host buildfarm the configuration “API”/structure will need to change as well. So the current configuration values will require later changes to keep up with master when that pull request merges. I’m happy to open a discussion on discourse, or in a GitHub issue, to go into further detail on the branching model discussion. Where do folks prefer?

The xenial branches are remnants and will be removed at a future date. The buildfarm_deployment xenial branch was merged by Update master to support the Xenial changes by nuclearsandwich · Pull Request #158 · ros-infrastructure/buildfarm_deployment · GitHub and has not been deleted yet to accommodate hosts that autoreconfigure based on the xenial branch. I think my ROS 2 farm is the primary culprit here and per advisory comments I was waiting to delete branches to give folks testing them time to move off and onto master which is the branch that currently sees all new development.

1 Like

The xenial branch in the ros_buildfarm repo was just a left over. I just removed the branch to avoid confusion.

Hi all,

I’ve been trying to get ros_buildfarm instance up and running using the latest updates and, I’ve faced two major issues; I still haven’t been able to solve one of them.

The first one might sound really silly. I used to get jenkins installed from its repository, that at the time it had a newer version of it (2.73.3, though 2.89.1 is now the most recent version). However, yhe default configuration of buildfarm_deployment_config uses the 2.60.3 version of jenkins.

I filled the configuration files with the data from the servers I was going to use and I proceeded to install and reconfigure the ros_buildfarm. Once it was over, apparently with no errors, I wasn’t able to get into jenkins by using its web interface. I then checked apache and the opened ports: Apache was installed, and port 80 was open, though only to IPv6.

Finally, after finding the pupper log file by chance (I’ve never used before puppet and I didn’t know it had a log file), it had an error on a non-existant jenkins version. Once the configuration file was changed to 2.73.3, the whole process worked properly. It would had been nice to get an error message regarding an unavailable jenkins version or maybe some hints about the existing log file or the possible errors.

The second problem is related to job generation. After cloning the ros_buildfarm and ros_buildfarm_config, and filling in the data, I’ve not been able to generate jobs due to a python syntax error (although this might not be the place, just in case, i’ve uploaded python’s output. This has me currently blocked, and I don’t know how to solve it, so some hints or documentation would also be nice.

Thank you for your hard work,

Best regards,

Looking through the log you linked:

Invoking '/usr/local/lib/python2.7/dist-packages/ros_buildfarm-1.4.2_master-py2.7.egg/EGG-INFO/scripts/generate_doc_independent_job.py file:///root/ros_buildfarm_config/index.yaml independent-packages --dry-run'

Don’t the ros_buildfarm scripts require Python 3.x (see also ros_buildfarm/doc/environment.rst)?

I had opted to use python2 because the scripts need the python-jenkisapi package. Only the python2 version of the package is present in the xenial’s repository.

I have proceeded to remove those and install their python3 versions by using pypi, and now the error is gone, thanks!.

I think that is why the instructions install that package using pip3.

@inigomartinez glad you’re using the ROS buildfarm scripts and thanks for sharing your experience and issues doing so. If you could please open an issue on https://github.com/ros-infrastructure/buildfarm_deployment for the jenkins LTS version that would be awesome. Feel free to open issues on that repository if anything else doesn’t work as documented during the deployment process.

Additionally, per our support guidelines please ask questions on ROS Answers. Adding the buildfarm tag to your question is usually enough to notify folks who can help.

Thanks,
Steven!

@nuclearsandwich: trying to deploy an instance of the Xenial based farm. Running into some issues and have some questions.

Actual issues I’m posting on the appropriate tracker, but for questions I should probably start a new thread in the Buildfarm category here on Discourse?

@gavanderhoorn Thanks for opening issues. I’ll be aure to look at them when I get to the office.

Questions about running the buildfarm_deployment projects are best asked on answers.ros.org. I’m watching several labels related to the buildfarm including, I think, “buildfarm”.