Changes to buildfarm_deployment for Jenkins JEP-200

For the past several months, the example buildfarm_deployment_config has included a pinned Jenkins version to prevent using versions that comply with JEP-200 from being used due to issues with the ros_buildfarm scripts and the plugin versions they use. (First reported by @gavanderhoorn in https://github.com/ros-infrastructure/buildfarm_deployment/issues/193)

I’ve been working on upgrading the buildfarm deployment and ros_buildfarm scripts to work on post-JEP-200 deployments and am happy to report that we’re now using the latest Jenkins LTS on both build.ros.org and build.ros2.org.

That work is being prepared for merging across three pull requests:

To allow time for community members using the master branch of buildfarm_deployment to make the transition we won’t be merging these changes right away. The important dates are here:

  • 2018-12-17 buildfarm_deployment master branch will require Jenkins >= 2.138.3 and updated plugins.
  • 2019-01-17 buildfarm_deployment transitional branch jenkins-lts-upgrade will be removed from buildfarm_deployment.

For those who aren’t able to schedule the upgrade before 2018-12-17, the pre-jep-200 branch is a snapshot of the current master which won’t be receiving further updates and moving your existing deployments to this branch will keep you in the current state as long as you need.

Additionally, I’ve written the guide below based on my experiences updated build.ros.org and build.ros2.org with review and testing from some intrepid community members. I’ve done my best to communicate everything I encountered during the upgrade process but I can’t make any guarantees that the guide below will cover every contingency. If you do run into trouble feel free to open an issue or ask a question with the buildfarm_deployment tag on https://answers.ros.org

ROS Buildfarm Deployment JEP-200 Upgrade Guide

The last LTS version prior to JEP-200 was 2.89.4 which has been the pinned version for new deployments since https://github.com/ros-infrastructure/buildfarm_deployment_config/pull/31.
The current Jenkins LTS as of this writing is 2.138.3.

Overview

Due to the way the buildfarm_deployment is put together, it is not recommended to re-run puppet or the reconfigure.bash script on the Jenkins master after initial configuration (see #160).
Buildfarm deployments wishing to update existing deployments will need to take manual steps on their jenkins hosts to migrate successfully to the latest version of ros_buildfarm (forthcoming).

Notable changes

User-facing changes:

Aside from some minor changes to the Jenkins sign in page and other visual updates, most users of ROS Buildfarm deployments will not notice any significant differences.
The minor issue #166 has been resolved with updates to the GitHub API and GitHub Pull Request Builder plugins.
Users with Jenkins API tokens may need to recreate them after coordinating operators.

Operator-facing changes

JEP-200 mandated that a large number of plugins used by the buildfarm be updated,
so I took the opportunity to update all of them and test comprehensively rather than trying to work out only those updates that are necessary.

Jenkins upstream recommendeds revoking existing “Legacy” API tokens and replacing them with the new token system.
Legacy tokens are not revoked on upgrade and there’s a dedicated UI for reviewing and revoking tokens as you have the opportunity to reprovision them. See https://jenkins.io/blog/2018/07/02/new-api-token-system/ for details.
ROS Buildfarm aspects which can make use of API tokens:

  • Swarm agent client authentication
    If your organization’s clone of the buildfarm_deployment_config repository uses GitHub for authentication, switching to use of Jenkins API tokens from GitHub API tokens will reduce the number of GitHub API calls necessary to manage authentication to your buildfarm deployment.
    The token can be used for this config value.
  • ~/.buildfarm/jenkins.ini may contain a Jenkins token rather than a password
    Especially if your buildfarm deployment uses GitHub for authentication it’s recommended that operators create and use Jenkins API tokens rather than GitHub API tokens for authenticating to your buildfarm when running ros_buildfarm scripts.
    The GitHub API tokens will require your buildfarm to authenticate the token with GitHub on each request, which can exhaust your Jenkins instance’s API rate limit for large deployments.
    Using a Jenkins API token to avoid a GitHub API request for each authentication is also anecdotally faster, resulting in faster reconfigurations.

Actions Required

Before you start

:rotating_light::rotating_light::rotating_light: Make backups and test them! :rotating_light::rotating_light::rotating_light:

Unless you’ve made a reliable backup before you start I cannot recommend proceeding with this guide.
Backups aren’t real until you’ve validated the restore procedure.

Review this guide in full.
This guide was written after upgrading build.ros2.org revised after upgrading build.ros.org and should be used as a starting point.
There may be other necessary actions for your setup that this guide does not capture.

System requirements: These upgrades are only recommended if you’re running a relatively recent Ubuntu 16.04 based buildfarm that was set up after the buildfarm_deployment migration to xenial or has undergone the procedure described in ROS Buildfarm October 2017 Guide to new changes

Updating buildfarm_deployment_config

Your buildfarm deployment config is likely a duplicate of the master branch from https://github.com/ros-infrastructure/buildfarm_deployment_config. This repository, like https://github.com/ros-infrastructure/buildfarm_deployment, has a master branch which roughly tracks the production deployments build.ros.org and build.ros2.org and updates to the buildfarm_deployment puppet logic may not be compatible with your existing configuration.
Some changes to deployment behavior are also implemented directly via configuration directive.
To review your buildfarm_deployment_config against the changes made since, I’d suggest creating a local clone and using git-diff to compare the state of your config against the upstream one.

Upstream configuration url

Before merge of https://github.com/ros-infrastructure/

Updating Jenkins

  1. Put jenkins into shutdown mode and let any outstanding jobs complete.

  2. Take jenkins offline with systemctl stop jenkins. If there are many jobs queued, Jenkins may take a while to shut down. Waiting until the jenkins java process stops is recommended.

  3. Stop jenkins agent on master with systemctl stop jenkins-slave

  4. (Optional) Run apt-get update && apt-get upgrade. Review the list of upgraded packages

  5. Remove any holds placed on jenkins and run apt-get update && apt-get install jenkins.

  6. If prompted to update the /etc/default/jenkins file, you may either reject the changes (what I did) or review them to be integrated with your setup.

  7. Install updated plugins. I’v prepared a script which will use bash and curl to install plugins by dropping them in the correct location. You could also modify this script to use the jenkins-cli.jar, or just use it as a list of plugin updates to perform with an online jenkins instance although I experienced filesystem permissions issues when doing so and the resulting state left my Jenkins un-bootable. However, if you check filesystem permissions beforehand to make sure all plugins are owned by jenkins rather than root it may work for you.

  8. Remove any .pinned files from your jenkins plugin directory rm /var/lib/jenkins/plugins/*.pinned. You may save the list of pinned files if you wish to restore them later.

  9. Start jenkins, watch the logs for any issues failing to initialize plugins

  10. From the Jenkins UI, review any administrator warnings.

  11. (Optional, but recommended) Disable JNLPv2 and JNLPv3 on the Jenkins Global Security Configuration screen.

  12. Sign into Jenkins as the user your agent hosts authenticate as and generate an API token for use as the Buildfarm Swarm Agent copy this token into your buildfarm_deployment_config as the jenkins::slave::ui_pass value.

  13. SSH Credentials can no longer be read from a file on disk due to this vulnerability. Credentials using the key from disk get migrated to “Directly entered” keys without the key being read from disk. Very early or modified deployments of the buildfarm may have been using an ssh key read from a file. This can be fixed with the following steps

  • Copy the text of your ssh key from the buildfarm_deployment_config master.yaml field jenkins::private_ssh_key. Take care to trip the leading whitespace if you copy it from the YAML block text format.
  • Open the Credentials UI and select the global credential matching your ssh username (jenkins-agent by default). Select “Update”
  • Paste the ssh key text into the text area and select “Save”. No other field should need to be changed.
  1. Reconfigure Jenkins using a compatible branch of ros_buildfarm.
    As of 2018-11-16 the Jenkins LTS work for ros_buildfarm is not yet merged into master or released. The pull request https://github.com/ros-infrastructure/ros_buildfarm/pull/587 has a branch with some workarounds for plugin issues and template updates for plugin versions in configuration structures.

  2. Update the Script Approval whitelist. If you have a highly customized ROS buildfarm deployment your scriptApproval.xml whitelist of allowed scripts and Groovy signatures might have diverged. To proceed you have three options:
    A. Let jobs fail due to sandbox violations and whitelist them as needed.
    B. Manually apply the additions to the scriptApproval.xml file in buildfarm_deployment to your deployment’s /var/lib/jenkins/scriptApproval.xml and restart Jenkins.
    C. Copy the scriptApproval.xml file to your instance replacing your existing /var/lib/jenkins/scriptApproval.xml and restart Jenkins.

Updating Agent on Master

  1. Download the swarm client jar version 3.14 from this link and place it in /home/jenkins-agent on the master host. Make sure it’s owned by the jenkins-agent user.

  2. Open /etc/default/jenkins-slave and make the following changes:

  • Update the line for JENKINS_SLAVE_JAR to JENKINS_SLAVE_JAR="${JENKINS_SLAVE_HOME}/swarm-client-3.14.jar"
  • Update the line for JENKINS_PASSWORD to match the API token used for step 12 when updating the Jenkins master.
  1. Remove the old swarm client jar rm /home/jenkins-agent/swarm-client-2.0-jar-with-dependencies.jar

  2. Start the Jenkins agent systemctl start jenkins-slave and verify that it is able to connect to your master host.

Updating Repo and Agent hosts

  1. Pull the latest version of your buildfarm_deployment_config and run ./reconfigure.bash from the repository root.
  2. Verify that both the building_repository-* and agent-* nodes have successfully reconnected.

If you follow this guide, please report back your successes or any problems you encounter.

2 Likes

While I mentioned that these changes would merge 17 December. They remained unmerged until today as I was busy with the release of ROS 2 Crystal and the first Crystal patch release.

The jenkins-lts-upgrades branches will remain online for the next week before being removed. Anyone currently using them should be able to switch over to the master branch.

1 Like

These branches have now been removed from GitHub.