ROS Buildfarm October 2017 Guide to new changes

Open Robotics hosts the primary ROS buildfarm at http://build.ros.org which has recently migrated to new hosts running Ubuntu Xenial. In the process of doing so we substantially overhauled the configuration management and made some desirable terminology changes which require intervention when updating the ros_buildfarm python module and scripts.

Target audience

If you or your organization run a Jenkins instance or cluster that uses the python libraries and scripts found in the python package ros-buildfarm (also at https://github.com/ros-infrastructure/ros_buildfarm) the upcoming changes affect you.

Additionally if your buildfarm machines were originally provisioned with the configuration management tools in https://github.com/ros-infrastructure/buildfarm_deployment and https://github.com/ros-infrastructure/buildfarm_deployment_config there are significant changes that are of particular importance if you kept the default autoreconfigure settings. Action is required to ensure successful operation even if you are not ready to update at this time.

Timeline

Xenial support will not be merged into master branches earlier than 19 October. Between now and then, I hope you’ll determine which path you’ll take, review the migration guide, and raise any questions or concerns you have before that date.

The ros_buildfarm release needed to perform either upgrade method is still pending. The xenial branch of ros-infrastructure/ros_buildfarm contains the bulk of the likely changes. As soon as is feasible we will create the release and update this guide.

Warnings

  • Do not attempt to re-run reconfigure.bash from buildfarm_deployment on the master host. It will clobber many Jenkins configuration details.

  • Running the xenial branches for buildfarm_deployment and buildfarm_deployment_config on a trusty-based host has never been tried and is not supported. Some modules assume systemd is the service supervision provider and definitely will not work.

CHANGELOG (Abridged)

Updated system software:

  • Ubuntu 16.04 LTS
  • Jenkins LTS 2.60.3
  • Docker CE 17.05
  • Java 8
  • Puppet 3.8

ros_buildfarm

  • Update Jenkins terminology in job names, scripts, system directories, and docs.
  • Add config generation for upload jobs (only for build.ros.org)

buildfarm_deployment

  • Refactor of puppet modules
  • Modules attempt to follow the “Roles and Profiles” pattern and are factored into reusable components.
  • Although this is untested, it is more possible than before to incorporate profile modules into a separate puppet infrastructure.
  • Updated and pinned to current puppetforge releases for upstream puppet modules.
  • Switched to the puppet future parser (puppet 4.x compatible parser)
  • Retire vendored upstart module in favor of systemd service provider on Xenial.
  • Add script to build reprepro 5.1.1 from backported sourcedeb.
  • Add script to fetch Jenkins plugin versions from build.ros.org and generate a puppet module installing those plugin versions.

buildfarm_deployment_config

  • Unified installed puppet modules across roles.
  • Refactored hiera config to share common data and provide role-specific configuration separately.
  • Install_prequisites.bash now uses system packages for puppet and librarian
  • Reconfigure.bash stores the configured role to prevent accidents when reconfiguring.
  • Update user account and hiera key names for current Jenkins terminology where possible.

Updating ros_buildfarm on existing (Trusty) hosts

It’s possible to update your running hosts with limited configuration changes that will allow them to benefit from changes to subsequent releases. Note that you will not be able to successfully run builds during the migration process.

  1. Rename the local user account from jenkins-slave to jenkins-agent.
  • This is a somewhat system dependent operation. On a Trusty system. The procedure below should cover most installs.
  • Gracefully stop services running as the jenkins-slave user.
  • Check for running processes ps -u jenkins-slave
  • If there are remaining non-critical processes stop them with pkill -u jenkins-slave otherwise wait for them to shut down gracefully.
  • usermod -l jenkins-agent jenkins-slave
  • groupmod -n jenkins-agent jenkins-slave
  • mv /home/jenkins-slave /home/jenkins-agent
  • usermod -d /home/jenkins-agent jenkins-agent
  1. Change the path to the jenkins slave jar in /etc/defaults/jenkins-slave to use the new home directory.

  2. Check the crontab for entries with hard-coded paths to the old home directory.

  3. Apply the label buildagent to all executor nodes with the previous label buildslave. To ensure this change will persist between restarts usually requires changes to /etc/default/jenkins-slave.
    Rename the check_slaves job to check_agents via the Jenkins web UI.

  4. Update the ros_buildfarm tools on your buildfarm

  • Using the ros_buildfarm scripts version TDB or greater run generate_all_jobs.py YOUR_BUILDFARM_CONFIG_URL
  • Review the diff output for potential issues.
  • Commit the changes with generate_all_jobs.py YOUR_BUILDFARM_CONFIG_URL --commit

Changing configuration to avoid breaking changes

With some changes to your buildfarm’s configuration you can continue to use the current (Trusty) configuration management infrastructure and buildfarm scripts until you are ready to perform the upgrade. Potentially you could continue to use the Trusty configuration indefinitely but you will be unable to use newer versions of the ros_buildfarm tools.

  1. Update the auto-reconfiguring host configuration.
  • If your configuration is set up to use the ros-infrastructure/buildfarm_deployment repository directly, you will need to make sure that any hosts with the autoreconfigure: true setting have their autoreconfigure_command updated to use the trusty branch rather than master.
  1. Set ros_buildfarm to use the last release before the xenial-related changes.
  • In order to preserve the current behavior until you’re ready to upgrade make sure you’re using version 1.4.1 or earlier.
    build.ros.org is designed to track the master branch of the ros_buildfarm scripts. Using the master branch or subsequent ros_buildfarm releases will use updated terminology that may cause errors or unexpected behavior if not handled by following the section: “Updating ros_buildfarm on existing (Trusty) hosts”.

Migrating to Ubuntu Xenial

We’ve done no testing to support upgrading buildfarm hosts to Ubuntu Xenial in place. The migration to Xenial for Open Robotics was performed by provisioning new hosts running Ubuntu Xenial, running the updated configuration management and migrating the Jenkins and repository data to the new hosts. While all buildfarm deployments would benefit from the improvements in the updated buildfarm stack, particularly large instances, it is not currently necessary to upgrade to Ubuntu Xenial in order to use newer versions of the ros_buildfarm scripts. You can instead follow the section marked “Updating ros_buildfarm on existing (Trusty) hosts”. Our migration followed the basic procedure below:

  1. Provision new Xenial hosts: master, repo, and agent.

  2. Run the Xenial configuration management scripts from https://github.com/ros-infrastructure/buildfarm_deployment_config on the new hosts.

  3. Put Jenkins into Shutdown mode and stop any remaining builds (or let them finish)

  4. Use rsync to copy packages from the existing repo host to the new xenial repo host.

  5. Stop trusty jenkins master, and jenkins agents on all trusty machines.
    archive /var/lib/jenkins Expect 10-40MB/s depending on compressibility and IO availability on an AWS machine.

  6. Stop jenkins-slave and jenkins on Xenial hosts if they were running.

  7. Transfer archive to new master ~10 minutes AWS internal

  8. Move existing /var/lib/jenkins into /tmp (it should not contain anything worth preserving)

  9. Extract Jenkins archive into /var/lib on the Xenial host

  10. Bring new jenkins master online with migrated config

  11. Run generate_all_jobs.py using version TBD of ros_buildfarm
    Start jenkins agents on xenial hosts

The “TBD” version of ros_buildfarm needed for the xenial-based hosts mentioned above is 2.0.0. The release announcement is here: New release of the ros_buildfarm package (version 2.0.0)

I either am having a moment with discourse or cannot edit the post directly to update the guide.