Windows CI builds are now officially containerized and virtualized on ci.ros2.org

I’m sorry I missed your questions regarding this announcement, but I am very happy that this work is useful to you as well! One note, we don’t install the python dependencies in the dockerfile because they are installed through the run_ros2_batch.py script, so you’ll have to add those as necessary.

It is possible to run graphical applications in the container, but nothing will be displayed. So anything you run has to be able to be run unattended, and Windows users like their graphical installers. Fortunately, any installer we’ve needed has had a unattended option. For graphical tests, I wouldn’t expect any issues with frame buffers, but I’m not 100% sure about that, or how they would be handled in a test. All of rviz tests run as expected in the container so far. We’ve started discussions about tests requiring GPU support, and it appears to be supported in Windows, but I haven’t started that investigation yet.

There appears to be a small performance hit (10%-15% longer build/test times) when running these builds and tests in a container in the cloud, but I think that’s primarily from running in a virtual environment. We’re also seeing timing related tests that had already exhibited a certain amount of flakiness, fail more regularly. There hasn’t been a test that only fails in the containers, they just show up more because they are more likely to fail in a container. For example, test_timer.py and test_rate.py in rclpy.

There are limitations with regards to container/host version compatibility. Because the container and host share the same kernel, you must run the container with the same Windows build version as the host (1809, 1903, 1909, etc). It is possible to run a docker container with hyperv isolation, which enables you to run an older container version than the host (1803-1909 only though). See this page for more info. However we’ve had a separate issue with hyperv isolation.

We’ve been forced to use process isolation when running the docker container (--isolation=process) because we mount a directory from the host that allows the Jenkins slave agent to read all the build and test results. It turns out that the service that handles pdb file access for debug builds, mspdbsrv.exe, somehow fails when accessing pdb files in this mounted directory if the container is not run with process isolation. Process isolation didn’t properly work with the 1809 image, and we were only able to do this once we switched to 1903/1909. With 1809, we needed to build in a directory in the container, then copy those results to the mounted directory. This affects all builds (even though it’s a debug build issue) because cmake runs a simple test build in debug mode to verify the compiler.

Oof, that’s a lot, but I think that covers it all.

Stephen