Runtime: may need to increase max user processes (ulimit -u)

I am getting weird errors stating that there are problems when trying to create threads on our runner-machine (a 40 cores beast):

Jobs are now randomly crashing with:

Checking out 90fbd6ce as master...
Skipping Git submodules setup
Checking cache for master-2...
No URL provided, cache will be not downloaded from shared cache server. Instead a local version of cache will be extracted. 
runtime: failed to create new OS thread (have 36 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

runtime stack:
runtime.throw(0x96eaaa, 0x9)
	/usr/local/go/src/runtime/panic.go:596 +0x95
runtime.newosproc(0xc4202cb400, 0xc422ef4000)
	/usr/local/go/src/runtime/os_linux.go:163 +0x18c
runtime.newm(0x98e5b0, 0x0)
	/usr/local/go/src/runtime/proc.go:1628 +0x137

And like this:

Running with gitlab-runner 11.5.1 (7f00c780)
on ECAP Runner 4a8ecb4c
Using Docker executor with image ...
Pulling docker image ...
Using docker image sha256:1c9dc8ddc5a1c75dda2c564e68cca6dbfd21ba997b7ee5824fceb86778907d02 for ...
ERROR: Job failed (system failure): Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 0 caused \\\"error running hook: exit status 2, stdout: , stderr: runtime/cgo: pthread_create failed: Resource temporarily unavailable\\\\nSIGABRT: abort\\\\nPC=0x7f44d224cfff m=0 sigcode=18446744073709551610\\\\n\\\\ngoroutine 0 [idle]:\\\\nruntime: unknown pc 0x7f44d224cfff\\\\nstack: frame={sp:0x7ffca8bdeaa8, fp:0x0} stack=[0x7ffca83e00b8,0x7ffca8bdf0e0)\\\\n00007ffca8bde9a8: 00005571234dd86f <runtime.adjustframe+175> 0000557125819518 \\\\n00007ffca8bde9b8: 0000557126282f00 00007ffc00000000 \\\\n00007ffca8bde9c8: 00005571234f5020 <runtime.goexit+0> 00007ffca8bdee10 \\\\n00007ffca8bde9d8: 00000000ffffffff 00000000a8bde970 \\\\n00007ffca8bde9e8: 00005571234f5020 <runtime.goexit+0> 00005571234e34a1 <runtime.funcspdelta+97> \\\\n00007ffca8bde9f8: 0000557125819518 0000557126282f00 \\\\n00007ffca8bdea08:

and this:

-- Docs:
--------- generated xml file: /builds/km3py/km3pipe/reports/junit.xml ----------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/builds/km3py/km3pipe/venv/lib/python3.6/site-packages/numpy/core/ KeyboardInterrupt
(to show a full traceback on KeyboardInterrupt use --fulltrace)
========================== 4 warnings in 0.51 seconds ==========================
Makefile:29: recipe for target 'test' failed
make: *** [test] Segmentation fault (core dumped)

Any ideas how to debug this? When I restart the jobs, they usually run through…

I checked ulimit but this shouldn’t be the problem:

# ulimit -u

It seems that somehow Docker instances are kept open. I restarted the Docker service and now the runners work as they should. I have not idea what happened there, but I leave this open and report back when I got news…