Packer Job Fails Whenever Run Via Schedule (works when run manually)

I have a perplexing problem, which doesn’t look like it’s .gitlab-ci.yml or runner related, or similar to any other problem I can see reported.

I have a scheduled CI job which runs Packer. It fails every time the schedule starts it, but works just fine if I run it myself (even if I set CI_PIPELINE_SOURCE=schedule). The weird thing is that the problem is inside Packer - it’s during an apt update task inside Packer - so running on an AWS server, completely unrelated to Gitlab, or its runner.

The general flow is:

  • Gitlab kicks off the pipeline
  • The first step in the pipeline is to get some AWS temporary credentials - this step works
  • The second step is to run Packer and give it some tasks
    • Packer start up correctly
    • Packer creates an AWS EC2 instance, and waits for it to start up
    • Packer SSHes onto the instance
    • Packer runs apt update → This fails on a schedule, but works manually
  • Since a pipeline step has failed, a third step runs to cleanup (which works correctly)

The gitlab-ci.yml config for the packer step is:

  stage: packer  
    name: hashicorp/packer:latest
    entrypoint: [""]
    - set
    - apk add gpg
    - cd ubuntu-20.04
    - gpg --dearmour apt-key
    - packer init .
    - packer build -var ci_build_id=${CI_PIPELINE_ID} .
    - job: aws-authentication-job
      artifacts: true
    - docker

(it’s failing in the packer build... step)

The pipeline log looks like this:

$ packer build -var ci_build_id=${CI_PIPELINE_ID} . output will be in this color.
==> Prevalidating any provided VPC information
==> Prevalidating AMI Name: iothic_ubuntu_20.04_golden_20220610_15.03 Found Image ID: ami-0d2a4a5d69e46ea0b
==> Creating temporary keypair: packer_62a35d47-2521-67dd-b015-1de9178cddef
==> Creating temporary security group for this instance: packer_62a35d4a-870e-5cab-e4eb-70469e99620c
==> Authorizing access to port 22 from [] in the temporary security groups...
==> Launching a source AWS instance... Adding tag: "CIBuildId": "1511" Instance ID: i-0ecded0731b50c16a
==> Waiting for instance (i-0ecded0731b50c16a) to become ready...
==> Using SSH communicator to connect:
==> Waiting for SSH to become available...
==> Connected to SSH!
==> Uploading apt-key.gpg => /tmp/apt-key.gpg
==> Uploading apt-key => /tmp/apt-key
==> Uploading iops.list-focal => /tmp/iops.list
==> Provisioning with shell script: /tmp/packer-shell1856619453
==> WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
==> Hit:1 focal InRelease Get:2 focal-updates InRelease [114 kB] Get:3 focal-backports InRelease [108 kB] Get:4 focal-security InRelease [114 kB] Get:5 focal InRelease [2109 B] Get:6 focal/universe amd64 Packages [8628 kB] Get:7 focal/universe Translation-en [5124 kB] Get:8 focal/universe amd64 c-n-f Metadata [265 kB] Get:9 focal/multiverse amd64 Packages [144 kB] Get:10 focal/multiverse Translation-en [104 kB] Get:11 focal/multiverse amd64 c-n-f Metadata [9136 B] Get:12 focal-updates/universe amd64 Packages [928 kB] Get:13 focal-updates/universe Translation-en [208 kB] Get:14 focal-updates/universe amd64 c-n-f Metadata [20.8 kB] Get:15 focal-updates/multiverse amd64 Packages [24.4 kB] Get:16 focal-updates/multiverse Translation-en [7336 B] Get:17 focal-updates/multiverse amd64 c-n-f Metadata [596 B] Get:18 focal-backports/main amd64 Packages [44.5 kB] Get:19 focal-backports/main Translation-en [10.9 kB] Get:20 focal-backports/main amd64 c-n-f Metadata [980 B] Get:21 focal-backports/restricted amd64 c-n-f Metadata [116 B] Get:22 focal-backports/universe amd64 Packages [23.7 kB] Get:23 focal-backports/universe Translation-en [15.9 kB] Get:24 focal-backports/universe amd64 c-n-f Metadata [860 B] Get:25 focal-backports/multiverse amd64 c-n-f Metadata [116 B] Get:26 focal/main amd64 Packages [12.9 kB] Get:27 focal/main all Packages [462 B] Get:28 focal-security/main amd64 Packages [1544 kB] Get:29 focal-security/main Translation-en [264 kB] Get:30 focal-security/restricted amd64 Packages [1001 kB] Get:31 focal-security/restricted Translation-en [142 kB] Get:32 focal-security/universe amd64 Packages [707 kB] Get:33 focal-security/universe Translation-en [127 kB] Get:34 focal-security/universe amd64 c-n-f Metadata [14.5 kB] Get:35 focal-security/multiverse amd64 Packages [22.2 kB] Get:36 focal-security/multiverse Translation-en [5376 B] Get:37 focal-security/multiverse amd64 c-n-f Metadata [512 B]
==> E: Could not open file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_focal-updates_restricted_cnf_Commands-amd64 - open (2: No such file or directory)
==> Traceback (most recent call last):
==>   File "/usr/lib/cnf-update-db", line 27, in <module>
==>     col.create(db)
==>   File "/usr/lib/python3/dist-packages/CommandNotFound/db/", line 95, in create
==>     self._fill_commands(con)
==>   File "/usr/lib/python3/dist-packages/CommandNotFound/db/", line 141, in _fill_commands
==>     raise subprocess.CalledProcessError(returncode=sub.returncode,
==> subprocess.CalledProcessError: Command '/usr/lib/apt/apt-helper cat-file /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_focal-updates_restricted_cnf_Commands-amd64' returned non-zero exit status 100.

I’m really lost how this could even be a thing, let alone where to look. All I can think is that maybe there’s a different environment variable or something, but even then I can’t see how such a thing would get into an EC2 instance started by a process which is running on a runner…!?

I appreciate this may not be gitlab directly, but any clues or ideas would be most welcome!