It seems it’s failing to download the rosdistro index (https://raw.githubusercontent.com/ros/rosdistro/master/index-v4.yaml) because it cannot verify the SSL certificate. I think it’s because the Docker container does not have the appropriate root certificates installed, but I’m not sure. Does anyone have a similar issue or knows what’s going on?
The strange thing is that it affects only the armhf build…
@cottsay did an investigation of this issue last week and identified that for as yet undetermined reasons, openssl rehash is not doing its job in our containers but that running c_rehash is enough to kick things into shape.
His minimal reproduction, which I’ve been using to test, involves installing openssl and ca-certificates and running this test command:
QEMU makes tracing the rehash process in the “guest” container impossible and I wasn’t able to reproduce the problem using our image or the stock ubuntu:focal image on native hardware. I was also able to produce the issue using the “stock” cross platform image with qemu injected.
docker run -ti -v /usr/bin/qemu-arm-static:/usr/bin/qemu-arm-static --platform armhf ubuntu:focal bash
As a temporary workaround we added this patch which runs c_rehash in the armhf containers but it now looks like we’re getting new errors that weren’t there before when actually building. I’ll be digging into those tomorrow with the goal being to get us back to building and then circle back to figuring out what exactly is happening.
For a summary of day two, we haven’t identified the exact problem. When running /usr/bin/qemu-arm-static -strace /usr/bin/openssl rehash to see the interpreted system calls the problem does not exhibit! Making this a very textbook heisenbug. Interestingly, invoking the interpreter explicitly without -strace (/usr/bin/qemu-arm-static /usr/bin/openssl rehash) also doesn’t exhibit the issue.
The issue doesn’t present on an 18.04 host. I didn’t have 18.10, 19.04, or 19.10 images to test with but it does seem like an interaction between the host kernel/libc, qemu, and the container userspace.
Once we we’d gotten that far we switched to trying a second workaround. With the 20.04 deployment one of the things we enabled were native ARM agents. When testing them we found an issue with the bundled default seccomp policy for Ubuntu 20.04, which also reminded me to include the personality system call used by sbcl. That change has been added to our native ARM machine and will soon be part of the cookbook. With it we were able to get a successful build of Nbin_ufhf_uFhf__rosfmt__ubuntu_focal_armhf__binary #21 [Jenkins]
and when Fix label configuration for noetic armhf. by nuclearsandwich · Pull Request #195 · ros-infrastructure/ros_buildfarm_config · GitHub is deployed we’ll be using the native ARM agents for all our Noetic armhf builds.
I’m going to write up what we’ve got on the qemu-discuss list tomorrow hoping for advice or suggesting we’ve found a bug to formally report. If the trail goes cold there we may not have availability to continue investigating the issue when running within QEMU as long as native agents are working for us.