Pipeline randomly timeout when downloading artifacts from previous job

Hi there!
I am having rather random issues in CICD in one of our projects. Sometimes pipeline fails when trying to download artifacts from previous stage (see screen). It just times out (we have 60m timeout). Sometimes when I restart the job, it fixes itself. Sometimes whole pipeline needs to be started again. And I am unable to determine what the hell is going on.

We have:

  • Gitlab on premise
  • docker-windows executor
  • shell is powershell
  • all jobs are run in Windows environment

Same problem has been also identified when trying to copy repo from other project. Randomly fails.

Could you please point me in some direction? Debugging pipeline does not really show anything, nor did I noticed anyting in docker logs or windows event logs.
I have searched for simillar issue, but no luck so far.

I suspect some network issue, but dont really know where to start.

We had the same problem with CI/CD jobs timing out when downloading artifacts from a previous job, but ours happened 100% of the time. GitLab Runner is a Windows Server VM with docker-windows installed.

We noticed that, while we were able to manually start up a Windows Docker container on the Runner, all network speeds were extremely slow inside that container (and not just to our on-prem GitLab server). That led us to this post about enabling IP forwarding:

# Grab the name of your network card
ipconfig

netsh int ipv4 set interface "My NIC name" forwarding=enable

The way Docker Windows sets up container networking must not be a well-integrated solution, and while the default setup does technically work, it’s so slow as to become basically unusable. But if you enable IP forwarding (basically allowing the NIC to become a router), speeds drastically improve.

Do note that allowing a workstation/end-point to become a router does impose a security risk, so bear that in mind when applying the above change.