Gitlab runner (shell): timeout error while the script actually completes

Hello, I have a Gitlab Community Edition instance running on a Synology (via Docker), that I’m using for a Ruby on Rails project (dockerized also). To ‘deploy’, I’m using Gitlab CI and a runner (shell executor) that runs on a Mac (Gitlab versions: 12.6.2 for the Gitlab instance, 12.6, the Mac uses the last version of Mojave and the last version of Docker). The runner is triggered to commits on a given branch.

It worked fine for about a year now. Since my last update of the Ruby on Rails project (to upgrade Ruby to 2.6.5), the Gitlab Runner seems to ‘disconnect’ during the execution of the script, which causes the job to fail (timeout error). But looking at the test.log of the Rails application, the script has actually executed properly and the next step has also been executed properly.

Here’s the script section of the .gitlab-ci.yml file

- docker-compose -f docker-compose.base.yml -f docker-compose.test.yml build
- docker-compose -f docker-compose.base.yml -f docker-compose.test.yml run --rm web sleep 60 # to make sure that the DB is started
- docker-compose -f docker-compose.base.yml -f docker-compose.test.yml run --rm web curl -H "Content-Type:application/json" -XPOST -d '{"password":"dummypassord"}' -u neo4j:neo4j http://db:7474/user/neo4j/password # the database is cleared and the password must be changed

# the CI job log stops in the middle of this step (didn't find a reason), but the rails test.log shows that the migration completes
- docker-compose -f docker-compose.base.yml -f docker-compose.test.yml run --rm web bundle exec rails neo4j:migrate

# the rails test.log shows that this step has been completed, but it does not appear in the Gitlab job log
- docker-compose -f docker-compose.base.yml -f docker-compose.test.yml run --rm web bundle exec rails test

Does anyone have an idea of the origin of this problem (or where to search to debug it)? The fact that all steps of the script execute completely let me think that it’s rather a gitlab runner issue that a docker one. But I wouldn’t affirm it with certitude.

Thanks in advance!

Problem solved. For eventual reference, here’s how I managed to solve it.

  1. Have a good night of sleep :slight_smile:

  2. I discovered that the runner could be ran in debug mode (gitlab-runner --debug run). When the trace stop being sent, I observe the following messages in the log:

    WARNING: Appending trace to coordinator… failed code=500 job=688 job-log= job-status= runner=xxx sent-log=9247-37081 status=500 Internal Server Error

So it looked like the issue was actually on the gitlab instance side, rather than the runner.

  1. I searched in the api_json.log (from gitlab-rails), and did not see any trace of failure. But the cases where the requests succeeded where mentioned. So I thought about nginx, and saw that there were plenty of errors in the log files.

    2020/01/08 12:26:38 [crit] 389#0: *72802 open() “/var/opt/gitlab/nginx/client_body_temp/0000001249” failed (13: Permission denied), client: 192.168.1.55, server: gitlab.terminus.local, request: “PATCH /api/v4/jobs/687/trace HTTP/1.1”, host: “XXXX”

The permissions on var/opt/gitlab/nginx/client_body_temp were set on 777 for some reason, instead of 700. I changed that them to 700 and the problem was fixed.