DIND problem

Hello,
could anyone please help on the famous problem: dind

docker_build:
  stage: build
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker info

local run with gitlab-runner exec docker docker_build gives log as follows:

*** WARNING: Service runner–project-0-concurrent-0-docker-0 probably didn’t start properly.

2022-06-07T19:00:57.422929268Z ip: can’t find device ‘ip_tables’
2022-06-07T19:00:57.423473313Z ip_tables 32768 0
2022-06-07T19:00:57.423482386Z x_tables 53248 5 xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat,ip_tables
2022-06-07T19:00:57.423920067Z modprobe: can’t change directory to ‘/lib/modules’: No such file or directory
2022-06-07T19:00:57.425758803Z mount: permission denied (are you root?)
2022-06-07T19:00:57.425766750Z Could not mount /sys/kernel/security.
2022-06-07T19:00:57.425770845Z AppArmor detection and --privileged mode might break.
2022-06-07T19:00:57.426473918Z mount: permission denied (are you root?)

So it can’t start the service, while privileged = true is turned on.
tried a lot of things, nothing helped.
version of the runner is 15.0.0
tried on almalinux 8.5 completely updated, selinux disabled, and debian 11.3 completely updated, apparmor removed, with the same result.

could anyone pls advise on how to fix it.

1 Like

Hi,

not sure I understand the problem. I assume you have your own GitLab server, and now trying to add a GitLab Runner, with installing and running Docker to build container images? Which Docker version is involved, and how is the runner configured in the config.toml file?

What happens in the web UI when the config snippet is run in a pipeline, and the runner picks up the job? Another point of inspection can be to enable the debug log for the runner, to see what the error is and suggest potential fixes.

I’m not sure if a local gitlab-runner exec command works with DinD, never tried that.

1 Like

Hi,

Thank you for your answer.

Actually, yes, I have my own gitlab server, and a runner attached to it. Docker is 20.10.16 (the latest so far).
The config contents are as follows:

concurrent = 1
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "r2..."
  url = "https://.../"
  token = "..."
  executor = "docker"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.docker]
    tls_verify = false
    image = "docker:20.10.16"
    privileged = true
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/certs/client", "/cache"]
    shm_size = 0

I’ve never tried to run in a pipeline because I just wanted to create/debug the CI configuration before commit. Afaik, the local command line is intended for this. It’s very strange that locally runner may not support it.

Ok, I’ve just run it in the usual way: committed and let gitlab run CI, and the same error: the dind container was down (but this way I see no dind container logs, see only the final error: error during connect: Post "[http://docker:2375/v1.24/]...)

Running runner with --debug gives no more info:

[…]
Starting service docker:dind …
Looking for image docker:dind … job=1 project=0
No credentials found for docker:dind job=1 project=0
Pulling docker image docker:dind …
Using docker image sha256:5dbe252bd9afb23859f250989da416b2cd8ab30f4b61a2bc8fca6f9b05d7e665 for docker:dind with digest docker@sha256:44067c181dc5cc282457c76aa5afe782cd35f31d10144427728a422b3f84485e …
Removing container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 job=1 project=0
Disconnecting container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 from networks job=1 project=0
Removing container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 finished with error Error: No such container: runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 (docker.go:759:0s) job=1 project=0
Creating service container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 … job=1 project=0
Starting service container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 (034d5bd0a44f1cd32a32f7a115e6e70cabaeb46d0c611fc9cd40e98baf9437f8)… job=1 project=0
Created service docker:dind as 034d5bd0a44f1cd32a32f7a115e6e70cabaeb46d0c611fc9cd40e98baf9437f8 job=1 project=0
Waiting for services to be up and running (timeout 30 seconds)…
Looking for prebuilt image registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-febb2a09… job=1 project=0
Creating service healthcheck container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0-wait-for-service… job=1 project=0
Starting service healthcheck container runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0-wait-for-service (36dbb74b8978086f3944a3666ce63eb3493b92f140900b577bfdfb320c34e2b2)… job=1 project=0
Removing container 36dbb74b8978086f3944a3666ce63eb3493b92f140900b577bfdfb320c34e2b2 job=1 project=0
Disconnecting container 36dbb74b8978086f3944a3666ce63eb3493b92f140900b577bfdfb320c34e2b2 from networks job=1 project=0
Removed container 36dbb74b8978086f3944a3666ce63eb3493b92f140900b577bfdfb320c34e2b2 job=1 project=0

*** WARNING: Service runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 probably didn’t start properly.

Health check error:
start service container: Error response from daemon: Cannot link to a non running container: /runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0 AS /runner–project-0-concurrent-0-f883422cbdf7ff4e-docker-0-wait-for-service/service (docker.go:1166:0s)

Service container logs:
2022-06-09T23:28:04.447648608Z ip: can’t find device ‘ip_tables’
2022-06-09T23:28:04.448371812Z ip_tables 32768 0
2022-06-09T23:28:04.448380329Z x_tables 53248 5 xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat,ip_tables
2022-06-09T23:28:04.448673300Z modprobe: can’t change directory to ‘/lib/modules’: No such file or directory
2022-06-09T23:28:04.452261756Z mount: permission denied (are you root?)
2022-06-09T23:28:04.452316852Z Could not mount /sys/kernel/security.
2022-06-09T23:28:04.452325223Z AppArmor detection and --privileged mode might break.
2022-06-09T23:28:04.453069245Z mount: permission denied (are you root?)


[…]

In case DinD is not required, I’d recommend you look into Kaniko. Dind has many core issues (such as running privileged containers) which you can completely bypass.
I’ve also written a simple wrapper for Kaniko in the form of a container which allows me to build containers via CI by just passing a few environment variables. You can take a look at it here.

@aljaxus thank you for the info, I’ll take a look at it.
I’ve had an impression after googling that everyone could solve the problem and has working dind, but me =)

ok, at least it can start, but how can I put somewhere a built release executable for it to build a docker image with?

I have the same issue. Did you find a way to solve it?
I need to use dind, so other options are not of interest to me.

I assume you want to build a docker image that contains a binary (or .jar / .exe / etc) which is built in the CI pipeline?

You would use CI artifacts. Define two CI jobs, one builds your executable, the second one (the one with build-oci) builds your container and pushes it to the registry. The second one should depend on the first one.

@aljaxus
I understand how to do this using CI in general. the problem is how to debug a single job ('cause gitlab-runner exec can’t execute multiple jobs). and the actual problem is build_dir is created in a volume, and afterwards it takes files from git. I mean, I don’t understand, how can I do a build of a binary with one run of gitlab-runner, and after that to feed another run of gitlab-runner with the binary to build an image.

As already said, use CI artifacts to even get the workflow going.

The workflow would look something like;

  • Start job build-bin
    • start executor
    • pull repo
    • run job script
      • build binary1
    • upload binary1 as an artifact to gitlab
  • start job build-container
    • which depends on build-bin, use job dependencies
    • start executor
    • pull repo
    • download artifacts
    • run script
      • build container
      • push container to registry

@aljaxus thank you for the idea, I misread it (sorry for this).

No problem. Did it work? :smiley:

I didn’t really understand how to download it.

To create (upload) it, I write something like that:

  artifacts:
    paths:
      - an_executable
    expire_in: 1 week

But I didn’t find a way to simply download it.
Or did you mean, I should do like that:

  docker_biuld:
    stage: build
    script:
      wget -c https://example.com/<namespace>/<project>/-/jobs/artifacts/<ref>/file/<path>?job=<job_name>

?
If it’s yes, this looks to be not too convenient, as I should specify a domain, a namespace, a project name, which are potential subjects to change.
And this way I’d rather use an sftp link like sftp://host.local/home/user/src/<project>/<binary> (to download the recently built binary) to debug .gitlab-ci.yml.

Hi,

took me a bit longer, I researched, stopped, and continued in the past week quite often. The problem is tricky.

GitLab Runner spawns the docker image as a service container, and tries to figure out for how long to wait when the service is ready.

At some point, the integrated health check fails, and errors out. All other follow-up steps later fail because of the missing service container.

The problem is captured best in Gitlab CI does not know how to wait for services (#24197) · Issues · GitLab.org / GitLab · GitLab

This comment sheds some light into how the DinD service image is created and waited for by the GitLab Runner. The built-in health check then fails.

Unfortunately, I cannot recommend a workaround this time. I’d suggest reviewing the issue discussion and comment yourself. Hope this helps.

Cheers,
Michael

@dnsmichi
Thx for your reply.

Unfortunately, this isn’t the case, because the service container really doesn’t get up with a stale error, as illustrated by the log here: DIND problem - #5 by elfuego.

That looks a lot like the problem I got when I the runner used had (by accident) privileged = false instead on privileged = true.

Just for completeness, here’s part of the output I got from the job:

Using docker image sha256:ad6479b49f1e99b76779e8d08bff4cf388cd23d435bf248337998905fcdf310e for docker:dind with digest docker@sha256:28c6ddb5d7bfdc019fb39cc2797351a6e3e81458ad621808e5e9dd3e41538c77 ...
Waiting for services to be up and running (timeout 30 seconds)...
*** WARNING: Service runner-nfyusd5-project-5099-concurrent-0-3b2da7a207ec1edc-docker-0 probably didn't start properly.
Health check error:
start service container: Error response from daemon: Cannot link to a non running container: /runner-nfyusd5-project-5099-concurrent-0-3b2da7a207ec1edc-docker-0 AS /runner-nfyusd5-project-5099-concurrent-0-3b2da7a207ec1edc-docker-0-wait-for-service/service (services.go:187:0s)
Service container logs:
2023-06-23T14:46:39.148836039Z ip: can't find device 'ip_tables'
2023-06-23T14:46:39.149495509Z ip_tables              32768  0 
2023-06-23T14:46:39.149544519Z x_tables               53248  9 xt_comment,xt_multiport,xt_tcpudp,xt_state,xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat,ip_tables
2023-06-23T14:46:39.149872739Z modprobe: can't change directory to '/lib/modules': No such file or directory
2023-06-23T14:46:39.151905937Z mount: permission denied (are you root?)
2023-06-23T14:46:39.151965957Z Could not mount /sys/kernel/security.
2023-06-23T14:46:39.151974317Z AppArmor detection and --privileged mode might break.
2023-06-23T14:46:39.153114677Z mount: permission denied (are you root?)
*********
Pulling docker image docker:stable ...

I found this question when searching for docker "ip: can't find device 'ip_tables'", so even if this was not the cause of @elfuego’s original problem (he did say it was with privileged = true), it might be helpful to others.

1 Like