Windows CI builds are now officially containerized and virtualized on ci.ros2.org

Greetings all,

I’m excited to announce the official start of Windows containerized builds on the ROS 2 buildfarm, ci.ros2.org. We have been testing these builds for a couple of months now and have seen improvements in reliability and maintainability over manually provisioned bare-metal instances. ROS 2 itself is reliable in both bare-metal and containerized Windows environments, but executing Jenkins jobs in containerized environments adds many benefits, including:

  • Jenkins has a much easier time cleaning up processes and files from each build for a sanitized build and test environment
  • Like the Linux docker builds, the Windows dockerfile stays up-to-date with the latest Visual Studio, Qt, and Chocolatey releases to help catch incompatibilities before users find them. (ci #383, rclcpp #963, rclcpp #1000)
  • It also provides a self-documenting “how-to install a Windows ROS 2 development environment”

These benefits will lead to a better overall experience for Windows ROS 2 users and developers alike by ensuring ROS 2 is reliably tested through CI on a current Windows environment.

You can find the new dockerfile and an automated Qt installer script at the public ros2 ci repo. The dockerfile has been custom written for ci.ros2.org, but it should provide a good template and launching point for anyone looking to work with their own containerized Windows ROS 2 environments.

We have also taken this opportunity to run these containerized builds from cloud VM instances, which provides further maintainability benefits and provides easy scalability. Now poorly behaving Windows Jenkins agents can be unceremoniously removed from the agent pool (or just restarted).

Best regards,
Stephen

10 Likes

Congratulations and happy to see this Windows CI improvement. :slight_smile:

1 Like

That’s awesome!

I’ve been tinkering with Windows containers in the hope of getting a Docker image for ROS 2 package testing (basically the equivalent of osrf/ros2:nightly but on Windows) but never got around finishing it.
Using your work as a base will save a tremendous amount of time!

Can you share some insight of the limitations (if any) of the container approach compared to running natively ?
e.g. Is it possible to run graphical apps in the container? or to run tests requiring a frame buffer (thinking RViz and RViz plugins) ?

I’m sorry I missed your questions regarding this announcement, but I am very happy that this work is useful to you as well! One note, we don’t install the python dependencies in the dockerfile because they are installed through the run_ros2_batch.py script, so you’ll have to add those as necessary.

It is possible to run graphical applications in the container, but nothing will be displayed. So anything you run has to be able to be run unattended, and Windows users like their graphical installers. Fortunately, any installer we’ve needed has had a unattended option. For graphical tests, I wouldn’t expect any issues with frame buffers, but I’m not 100% sure about that, or how they would be handled in a test. All of rviz tests run as expected in the container so far. We’ve started discussions about tests requiring GPU support, and it appears to be supported in Windows, but I haven’t started that investigation yet.

There appears to be a small performance hit (10%-15% longer build/test times) when running these builds and tests in a container in the cloud, but I think that’s primarily from running in a virtual environment. We’re also seeing timing related tests that had already exhibited a certain amount of flakiness, fail more regularly. There hasn’t been a test that only fails in the containers, they just show up more because they are more likely to fail in a container. For example, test_timer.py and test_rate.py in rclpy.

There are limitations with regards to container/host version compatibility. Because the container and host share the same kernel, you must run the container with the same Windows build version as the host (1809, 1903, 1909, etc). It is possible to run a docker container with hyperv isolation, which enables you to run an older container version than the host (1803-1909 only though). See this page for more info. However we’ve had a separate issue with hyperv isolation.

We’ve been forced to use process isolation when running the docker container (--isolation=process) because we mount a directory from the host that allows the Jenkins slave agent to read all the build and test results. It turns out that the service that handles pdb file access for debug builds, mspdbsrv.exe, somehow fails when accessing pdb files in this mounted directory if the container is not run with process isolation. Process isolation didn’t properly work with the 1809 image, and we were only able to do this once we switched to 1903/1909. With 1809, we needed to build in a directory in the container, then copy those results to the mounted directory. This affects all builds (even though it’s a debug build issue) because cmake runs a simple test build in debug mode to verify the compiler.

Oof, that’s a lot, but I think that covers it all.

Stephen