Irel_blocked-releases-page job on local build farm keeps failing

I’ve got a local build farm that is otherwise working fine; we’re releasing and building packages for Indigo, Kinetic, and Melodic without any problems.

The one except is that several days ago, the Irel_blocked-releases-page job started failing, and I can’t figure out why. Here’s the last section of the build log for it:

# BEGIN SECTION: Run Dockerfile - blocked_releases page
06:05:08 + rm -fr /home/jenkins-agent/workspace/Irel_blocked-releases-page/blocked_releases_page
06:05:08 + mkdir -p /home/jenkins-agent/workspace/Irel_blocked-releases-page/blocked_releases_page
06:05:08 + docker run --rm --cidfile=/home/jenkins-agent/workspace/Irel_blocked-releases-page/docker_generate_blocked_releases_page/docker.cid --net=host -v /home/jenkins-agent/workspace/Irel_blocked-releases-page/ros_buildfarm:/tmp/ros_buildfarm:ro -v /home/jenkins-agent/workspace/Irel_blocked-releases-page/blocked_releases_page:/tmp/blocked_releases_page blocked_releases_page_generation
06:05:10 Checking packages for "indigo" distribution
06:05:23 Traceback (most recent call last):
06:05:23   File "/tmp/ros_buildfarm/scripts/status/build_blocked_releases_page.py", line 44, in <module>
06:05:23     main()
06:05:23   File "/tmp/ros_buildfarm/scripts/status/build_blocked_releases_page.py", line 40, in main
06:05:23     args.output_dir, copy_resources=args.copy_resources)
06:05:23   File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 566, in build_blocked_releases_page
06:05:23     repos_info = _get_blocked_releases_info(config_url, rosdistro_name, repo_names=repo_names)
06:05:23   File "/tmp/ros_buildfarm/ros_buildfarm/status_page.py", line 720, in _get_blocked_releases_info
06:05:23     prev_cache = rosdistro.get_distribution_cache(index, prev_rosdistro_name)
06:05:23   File "/usr/lib/python3/dist-packages/rosdistro/__init__.py", line 173, in get_distribution_cache
06:05:23     data = yaml.safe_load(yaml_str)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 94, in safe_load
06:05:23     return load(stream, SafeLoader)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/__init__.py", line 72, in load
06:05:23     return loader.get_single_data()
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 37, in get_single_data
06:05:23     return self.construct_document(node)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 46, in construct_document
06:05:23     for dummy in generator:
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map
06:05:23     value = self.construct_mapping(node)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 204, in construct_mapping
06:05:23     return super().construct_mapping(node, deep=deep)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 129, in construct_mapping
06:05:23     value = self.construct_object(value_node, deep=deep)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 86, in construct_object
06:05:23     data = constructor(self, node)
06:05:23   File "/usr/lib/python3/dist-packages/yaml/constructor.py", line 414, in construct_undefined
06:05:23     node.start_mark)
06:05:23 yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/str'
06:05:23   in "<unicode string>", line 5773, column 28:
06:05:23         </package>\n", ar_sys: !!python/str "<?xml version=\"1. ... 
06:05:23                                ^
06:05:23 Build step 'Execute shell' marked build as failure
06:05:23 SSH: Current build result is [FAILURE], not going to run.
06:05:23 Sending e-mails to: 
06:05:23 Finished: FAILURE

The cause seems obvious – some kind of failure parsing “\n”, ar_sys: !!python/str" near the end – but I have grepped through every file on all of our build farm servers and every file in our build farm repositories and cannot find that string. I’m guessing it’s being automatically generated by something, but I don’t know what or where to start looking. For reference, the Krel and Mrel versions of that job are just fine.

Does anybody have an idea what might be causing that or where I should start looking?

Thanks in advance!

This looks related to the safe_load change from use yaml.safe_load for untrusted yaml input by dirk-thomas · Pull Request #128 · ros-infrastructure/rosdistro · GitHub. Did you check the gzipped rosdistro caches? The stacktrace has the rosdistro function get_distribution_caches. It may be that the cache generation is not dumping safe_load-able YAML.

As a nitpick request, questions like this are probably best asked on answers.ros.org.

The same job is failing on the official farm since Jan 24th: http://build.ros.org/job/Irel_blocked-releases-page/

It appears that the hydro rosdistro cache has non safe_load-able YAML entries.

 ❯ wget http://repositories.ros.org/rosdistro_cache/hydro-cache.yaml.gz
--2019-02-14 17:40:14--  http://repositories.ros.org/rosdistro_cache/hydro-cache.yaml.gz
Resolving repositories.ros.org (repositories.ros.org)... 52.53.127.253
Connecting to repositories.ros.org (repositories.ros.org)|52.53.127.253|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 325081 (317K) [application/x-gzip]
Saving to: ‘hydro-cache.yaml.gz’

hydro-cache.yaml.gz               100%[============================================================>] 317.46K  --.-KB/s    in 0.08s   

2019-02-14 17:40:14 (4.06 MB/s) - ‘hydro-cache.yaml.gz’ saved [325081/325081]
steven@octobeast:~[127] ❯ gunzip hydro-cache.yaml.gz 
steven@octobeast:~ ❯ grep ar_sys hydro-cache.yaml 
    ar_sys:
      doc: {type: git, url: 'https://github.com/Sahloul/ar_sys.git', version: hydro-devel}
      source: {type: git, url: 'https://github.com/Sahloul/ar_sys.git', version: hydro-devel}
    </package>\n", ar_sys: !!python/str "<?xml version=\"1.0\"?>\n<package>\n\t<name>ar_sys</name>\n\
    >http://wiki.ros.org/ar_sys</url>\n\t<author email=\"sahloul@race.u-tokyo.ac.jp\"\

My inclination would just be to disable the job since we’re not going to modify the hydro distribution cache and at least on the official farm Indigo is approaching its end of support.

cc @tfoote for their thoughts.

Yeah, disabling that job I think makes sense. Indigo is already far ahead of hydro in package count and tracking new releases into it at this point is not a priority. Most of the packages listed by the job have been specifically chosen not to be ported.