Externe Runners get stuck on every job with "Appending trace to coordinator... failed code=501"

We are running GitLab CE in an on-prem deployment. Runners within the LAN are working perfectly fine on various OSes.
Now we configured reverse proxies to make the on-prem deployment reachable from the Internet to do some magic with MacOs and Swift. Strangely though, out GitLab runners on Mac (and Linux for testing) are able to register and pick up jobs, but seem to fail to report the job status back to Gitlab.

This is what it looks like in our docker bases runner on a Mac when we try to run a sample pipeline with nothing more than a simple echo in a shell:

Using Shell executor...                             job=233 project=146 runner=********
No referees configured                              job=233 project=146 runner=********
Executing build stage                               build_stage=prepare_script job=233 project=146 runner=********
Waiting for signals...                              job=233 project=146 runner=********
Executing build stage                               build_stage=get_sources job=233 project=146 runner=********
Executing build stage                               build_stage=restore_cache job=233 project=146 runner=********
Executing build stage                               build_stage=download_artifacts job=233 project=146 runner=********
Executing build stage                               build_stage=build_script job=233 project=146 runner=********
Executing build stage                               build_stage=after_script job=233 project=146 runner=********
Executing build stage                               build_stage=archive_cache job=233 project=146 runner=********
Feeding runners to channel                          builds=1
Executing build stage                               build_stage=upload_artifacts_on_success job=233 project=146 runner=********
Skipping referees execution                         job=233 project=146 runner=********
Job succeeded                                       duration=1.4166992s job=233 project=146 runner=********
Dialing: tcp ***.*******.******.**:443 ...         
WARNING: Failed to parse "X-GitLab-Trace-Update-Interval" header  error=strconv.Atoi: parsing "": invalid syntax header-value= job=233 runner=********
WARNING: Appending trace to coordinator... failed   code=501 job=233 job-log= job-status= runner=******** sent-log=0-2011 status=501 Not Implemented update-interval=0s

Within Gitlab the job is marked as “Running” but never finished.

Does any of you have an idea how to troubleshoot this and what might be the reason for the “Appending trace to coordinator… failed” message?
From my understanding, the “Failed to parse “X-GitLab-Trace-Update-Interval” header” message is more or less cosmetic in nature and can be ignored.

Thanks in advance for your thoughts and ideas!

Cheers
Carsten

1 Like

We too run gitlab on premise and we use kubernetes runners we do see such warning and the job hangs without processing any further.

WARNING: Failed to parse "X-GitLab-Trace-Update-Interval" header error=strconv.Atoi: parsing "": invalid syntax header-value= job=17419 runner=ssfCjVRA WARNING: Appending trace to coordinator... aborted code=403 job=17419 job-log= job-status=canceled runner=ssfCjVRA sent-log=5497-6836 status=403 Forbidden update-interval=0s WARNING: Submitting job to coordinator... aborted code=403 job=17419 job-status=canceled runner=ssfCjVRA WARNING: Failed to process runner builds=0 error=canceled executor=kubernetes runner=ssfCjVRA

Hi,

which versions of GitLab and the CI runner are involved here? I suspect that either one of them differs and possible the runner being too old does not parse the header correctly.

Cheers,
Michael

Hi @dnsmichi Gitlab version is 13.0.5-ee and gitlab runner is 13.0.1(we have deployed the runners using helm chart. Chart version 0.17.1)

Hi,

We are seeing the same problem. Sometimes jobs just stuck in running state in Gitlab and nothing is shown in Gitlab runner log.
When the job is cancelled and retried, then something appears on the gitlab runner’s log.

Here’s the relevant part of the log:

    kesä 23 09:36:37 boulder gitlab-runner[85258]: Checking for jobs... received                       job=719 repo_url=<redacted> runner=8UM6xfC-
    kesä 23 09:36:37 boulder gitlab-runner[85258]: Checking for jobs... received                       job=719 repo_url=<redacted> runner=8UM6xfC-
    kesä 23 09:56:42 boulder gitlab-runner[85258]: WARNING: Submitting job to coordinator... aborted   code=403 job=719 job-    status=canceled runner=8UM6xfC-
    kesä 23 09:56:42 boulder gitlab-runner[85258]: WARNING: Submitting job to coordinator... aborted   code=403 job=719 job-status=canceled runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Timed out waiting for logs to finish copying from container  job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Timed out waiting for logs to finish copying from container  job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Failed to inspect build container 7775286b3aecca2a91b7e49046c66d576f553eee98e408d0b46f082643a2e9ed context canceled (docker_command.go:78:0s)  job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Failed to inspect build container 7775286b3aecca2a91b7e49046c66d576f553eee98e408d0b46f082643a2e9ed context canceled (docker_command.go:78:0s)  job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Job failed: canceled                       duration=20m34.796206625s job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Job failed: canceled                       duration=20m34.796206625s job=719 project=46 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Failed to parse "X-GitLab-Trace-Update-Interval" header  error=strconv.Atoi: parsing "": invalid syntax header-value= job=719 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Failed to parse "X-GitLab-Trace-Update-Interval" header  error=strconv.Atoi: parsing "": invalid syntax header-value= job=719 runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Appending trace to coordinator... aborted  code=403 job=719 job-log= job-status=canceled runner=8UM6xfC- sent-log=8569-9323 status=403 Forbidden update-interval=0s
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Appending trace to coordinator... aborted  code=403 job=719 job-log= job-status=canceled runner=8UM6xfC- sent-log=8569-9323 status=403 Forbidden update-interval=0s
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Submitting job to coordinator... aborted   code=403 job=719 job-status=canceled runner=8UM6xfC-
    kesä 23 09:57:12 boulder gitlab-runner[85258]: WARNING: Submitting job to coordinator... aborted   code=403 job=719 job-status=canceled runner=8UM6xfC-
    kesä 23 09:57:13 boulder gitlab-runner[85258]: WARNING: Failed to process runner                   builds=1 error=canceled executor=docker runner=8UM6xfC-
    kesä 23 09:57:13 boulder gitlab-runner[85258]: WARNING: Failed to process runner                   builds=1 error=canceled executor=docker runner=8UM6xfC-

Gitlab version: 13.0.6-ee
Gitlab runner: 13.0.1

Hi,

seems that the warning is not related to the actual error but hiding this a bit. Might be removed with this MR.

@Hannu where is the runner being deployed? Kubernetes as well? Either way, please go head and report an issue for further analysis.

Cheers,
Michael