CI pipeline's 1 job succeeds, but all other jobs stuck in "created" state

We are having multiple projects in gitlab within a single organisation, with CI pipelines configured within each project. One of the project’s CI pipelines started showing the following behaviour 3 days ago:

(1.) In master branch’s build pipeline with 3 configured jobs, only first job creates and succeeds, while other 2 jobs are stuck in “created” state for ~15 hours, with no gitlab runners assigned to them.

(2.) In production build pipeline with 2 configured jobs, 1 job is getting created and succeeded, while another sits in created state infinitely.

We are having 2000 CI minutes, out of which only 76 have been used till date. There has been no change in .gitlab-ci.yaml file since 1 month. Urgent help is appreciated. Thanks.

Hi,

can you share the .gitlab-ci.yml configuration and some pipeline screenshots to get a better picture?

Cheers,
Michael

Hi @dnsmichi . Here is the .gitlab-ci.yaml file:

image: andthensome/docker-node-rsync:latest

variables:
  DOCKER_LOGIN_CMD: docker login -u ${CI_REGISTRY_USER} -p ${CI_REGISTRY_PASSWORD} ${CI_REGISTRY}
  IMAGE_PREFIX: ${CI_REGISTRY}/gamingengine/game

stages:
  - build
  - deploy

docker_staging:
  stage: build
  only:
    refs:
      - master
  image: docker:18
  services:
    - docker:dind
  script:
    - $DOCKER_LOGIN_CMD
    - docker build -t "$IMAGE_PREFIX/game-server:${CI_PIPELINE_IID}" -t "$IMAGE_PREFIX/game-server:latest" .
    - docker push "$IMAGE_PREFIX/game-server:${CI_PIPELINE_IID}"
    - docker push "$IMAGE_PREFIX/game-server:latest"

deploy_staging:
  stage: deploy
  only:
    refs:
      - master
  image: registry.gitlab.com/gamingengine/matcher/deploy:latest
  environment:
    name: staging
  script:
    - deploy -v staging -j ${MATCHMAKER_CONTAINERSHIP_PERSONAL_TOKEN} -t ${CI_PIPELINE_IID}


docker_production:
  stage: build
  only:
    - tags
  image: docker:18
  services:
    - docker:dind
  script:
    - $DOCKER_LOGIN_CMD
    - docker build -t "$IMAGE_PREFIX/game-server:${CI_COMMIT_TAG}" .
    - docker push "$IMAGE_PREFIX/game-server:${CI_COMMIT_TAG}"

# deploy_production:
#   stage: deploy
#   only:
#     - tags
#   image: registry.gitlab.com/gamingengine/matcher/deploy:latest
#   environment:
#     name: production
#   script:
#     - deploy -v production -j ${MATCHMAKER_CONTAINERSHIP_PERSONAL_TOKEN} -t ${CI_COMMIT_TAG}


deploy_droplet:
  stage: deploy
  script:
    # Install ssh-agent if not already installed, it is required by Docker.
    # (change apt-get to yum if you use a CentOS-based image)
    - 'which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )'

    # Run ssh-agent (inside the build environment)
    - eval $(ssh-agent -s)

    # Add the SSH key stored in SSH_PRIVATE_KEY variable to the agent store
    - ssh-add <(echo "$SSH_PRIVATE_KEY")

    # For Docker builds disable host key checking. Be aware that by adding that
    # you are suspectible to man-in-the-middle attacks.
    # WARNING: Use this only with the Docker executor, if you use it with shell
    # you will overwrite your user's SSH config.
    - mkdir -p ~/.ssh
    - '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'

    # Stop the game server
    - ssh -p22 root@136.154.278.187 "pm2 kill"
    # Copy code to server
    - rsync -r --exclude 'node_modules' --exclude '.git' . root@136.154.278.187:~/space/production
    # Install dependencies
    - ssh -p22 root@136.154.278.187 "cd ~/space/production && npm install"
    # Start the game server
    - ssh -p22 root@136.154.278.187 "cd ~/space/production && pm2 start scripts/server/main.js --name \"Space\" --node-args=\"--inspect\""
    # Compile and obfuscate client code
    - ssh -p22 root@136.154.278.187 "cd ~/space/production && npm run build"
    - ssh -p22 root@136.154.278.187 "cd ~/space/production && ./node_modules/.bin/uglifyjs --compress --mangle -o client.js -- client.js"
    # - ssh -p22 root@136.154.278.187 "cd ~/space/production && npm run obfuscate"
    # Copy client code to a location served by nginx
    - ssh -p22 root@136.154.278.187 "rm -rf /var/www/html/* && cd ~/space/production && cp -R index.html client.js beta-resources styles images audio /var/www/html && cp -R --parents scripts/client/libs scripts/client/shaders scripts/client/stats.js scripts/client/debug.js scripts/client/fps.js /var/www/html"
  only:
    - master
  environment:
    name: staging


deploy_gke:
  stage: deploy
  only:
    - tags
  image: registry.gitlab.com/gamingengine/matcher/deploy-gke:latest
  environment:
    name: gke_production
  script:
    - echo "$GKE_SERVICE_ACCOUNT_KEY" > key.json
    - gcloud auth activate-service-account --key-file=key.json
    - gcloud config set project space-224606
    - gcloud config set compute/region us-central1
    - gcloud container clusters get-credentials us-production-01 --region us-central1
    - deploy -v gke_production -g true -j ${MATCHMAKER_CONTAINERSHIP_PERSONAL_TOKEN} -t ${CI_COMMIT_TAG}

Here are the snapshots of the infinitely running pipeline, and it’s jobs that are stuck in created state :

Configs looks good to me, thanks. I was looking for tags, wrong only settings and anything else which could hinder the stages.

In this regard, I would advise to look into the runner’s logs, or syslog. They typically tell you whenever there are no more resources or something else is stuck.

Cheers,
Michael

@dnsmichi We are using shared runners in gitlab for our builds. So I am not sure how we can access their logs or ssh into them to see resource issues. Plus, for the jobs stuck in created state, there are no runners allocated to them yet.

@dnsmichi 1 more interesting thing. I just configured kuberenetes runner for my gitlab project. It showed the same behaviour (1 job successful, 2 other stuck in created state). Maybe something related to environments ?

Hi, we’re experiencing the same problem on our app. We haven’t changed our config in quite some time but 5 days ago our deployments stopped triggering after successful test builds.

We did some testing and discovered that removing the environment section of a job seems to allow the deploy job to start and finish as expected.

We’re currently running Gitlab Runner version 11.9.0. SSH’ing into the runner during a pipeline and viewing processes via htop didn’t bring up anything unusual that we could see.

Hi,

with using gitlab.com it could be a problem with available resources on the shared runners, or limited amount of minutes. In case you’re using a paid version, I’d suggest contacting support or look into specific infrastructure issues.

Hm, maybe an upgrade to a newer version helps. I can see that the 11.x branch has 11.11.2 already. Though that would also imply an upgrade of GitLab itself to the latest 11.x branch.

When you’re SSH’ing into the server, which distribution are you running on? On Ubuntu, /var/log/syslog is used for logging.

Cheers,
Michael

Hi, thanks for responding.

In our case I don’t think it is to do with running out of minutes as the deploy starts and completes correctly when we remove the environment field.

We are running an Ubuntu distro (18.10, specifically). I can’t see anything in the logs that would suggest we’ve run out of resources but I’ll continue testing in a separate branch with the environment fields turned on and try to catch something in the logs.

I guess the next step is to upgrade to the latest minor patch in version 11 and go from there.

1 Like