Upcoming maintenance for build.ros2.org: 2020-11-25

Edit 2020-11-23: Rescheduled for 2020-11-25T16:00:00Z

Edit: Postponed a few more days. See Upcoming maintenance for build.ros2.org: 2020-11-25

build.ros2.org will be the first of our build farms to update to 20.04 using the chef-based ros buildfarm deployment utilities (Work in progress: Chef cookbooks for ROS build farm configuration) this will require some downtime during which rosdistro PRs for ROS 2 distributions will be held and dev and PR jobs will not be triggered for ROS 2 distributions.

The ROS 2 build farm doesn’t have an entry on the ROS status page. If I can’t manage to add one before the start of the maintenance window I will use this thread to post updates.

Once the migration is complete similar operations will be scheduled for build.ros.org and ci.ros2.org.

Would that be 2020-10-29T15:00:00Z2020-10-30T03:00:00Z?

Edit: not any more.

1 Like

That is a cool feature! Yes.

However, I’m actually going to have to postpone the migration. There are a couple of components that I found out just a few hours ago aren’t quite ready and I’ll want to use the weekend to get them there.

I plan to proceed with this migration Sunday, Monday, or Tuesday (Pacific UTC -8 because Daylight Savings Time ends on Sunday in the US) and will post an updated time when I know exactly when.

1 Like

I was getting ready to get this done this Thursday until it was pointed out to me that I will be out of the office later this week and so it will be hard for me to perform this migration.

I’ve updated the topic to remove the date. Before I can set a new one I need to verify the ROS 2 sync / release schedule is clear. Once there is a new date set I will update this thread.

The new plan is to perform the migration starting at 2020-11-25T16:00:00Z.

If it has not completed by the scheduled start time tomorrow maintenance will begin when the current rebuild of Dashing completes.

Just waiting for the armhf jobs to finish now.

Seems you got things migrated: congrats :+1:

The email status updates went a bit like this: “problem encountered -> solved”, but never really provided any insight into how things were resolved.

For those of us also considering migrating our systems: were any of the problems you encountered things you consider completely local to your setups @nuclearsandwich, or would a brief description here be warranted to potentially save some (future) buildfarm admin some time?

Thanks! Things are still a work in progress. Some issues are I did not hit in our development or staging setups and only showed when trying to keep a busy production service running.

Thanks for the feedback. As we finish the tail end of this migration and prepare for the build.ros.org migration I’ll try to provide a bit more in the status updates, if only to provide better breadcrumbs for myself and the community to expand upon in a full review.

My plan for the path ahead is to get the changes made on our production branch reviewed and into the latest in preparation for a tagged release of the cookbook, get the chef workflow that we have been iterating on internally publicly documented, and then write up a migration guide like the one I wrote around the time of our Ubuntu Xenial migration ROS Buildfarm October 2017 Guide to new changes

Far and away the largest challenges have been caused by the desire to preserve the artifacts that were created prior to the migration. If you’re willing to scrap your old data setting up a new buildfarm is approaching straightforward, except that the chef workflow is new and different relative to the previous buildfarm deployment workflow and requires documentation.
There were some outright bugs or missing features in the new config that only turned up when we pulled in all the production data and saw “that doesn’t look right”.
There have also been a few unforseen stability issues. An import package job failed due to a timeout waiting for the gpg agent to start up and although the error was recoverable I can’t recall it happening previously. I also saw about 50% of the fleet’s docker daemons crash last night within several minutes of each other and I do not know why. The issue which caused last night’s shutdown is that the chef resource for creating Jenkins credentials is not idempotent and causes Jenkins to try and find an older version of the same credential unsuccessfully. I’ve worked around that for now but it is still in need of a better solution.

1 Like

I’ve finally closed the maintenance event on https://status.ros.org/ I haven’t had to make a hotfix to production for the last ten or so days and while there are still things outstanding we are looking pretty stable. I’m now shifting my focus to getting the hotfixes merged and creating a release of the cookbook in preparation for moving build.ros.org during the end of year holidays.

1 Like