My gitlab-ci.yml job failed (ERROR: Job failed: exit status 1) without enough debug information

My GitLab CI job failed. The message I have is ERROR: Job failed: exit status 1. This message is not informative enough for me to troubleshoot the error.

I am implementing CI/CD for a Node.Js Express application. Before I build and deploy the Server application, I am gracefully stopping / shutting-down the instance of the application that is actually running.

This is what I am trying to do inside the stop-job.

However, when I run the stop job, Gitlab runner will fail with the message ERROR: Job failed: exit status 1.

This is my code, inside .gitlab-ci.yml:

stages:          # List of stages for jobs, and their order of execution
  - stop

stop-job:       # This job runs in the <stop> stage, which runs first.
  stage: stop
  script:
    - echo 'Stopping job ...'

    # Send a kill / shutdown message to a server application listening on port 80
    - echo 'shutdown' | nc localhost 3000 || echo 'No process listening on port 3000'

    - |
      while true; do
          # Count the process using port 80
          process_count=$(lsof -i :80 | grep LISTEN | wc -l)
          
          # Check if no process is using port 80
          if [ "$process_count" -eq 0 ]; then
              echo "There is no application or process using port 80"
              break
          fi
      
          echo "Port 80 is currently in use. Retrying in 5 seconds..."
          # Wait 5 seconds before we retry again
          sleep 5
      done
      
      # We are out of the loop and the current instance using port 80 is closed
    - echo "Stopping job completed!"
  only:
    - main

    - echo 'Stopping job completed!'

I believe the error seems to happen around this part of the code.

      while true; do
          # Count the process using port 80
          process_count=$(lsof -i :80 | grep LISTEN | wc -l)

The command lsof -i :80 returns a value of 1, it typically indicates that no process is currently listening on port 80. While it may seem counterintuitive, this specific exit status (return value) of 1 in this context does not necessarily indicate a command execution failure or error. Instead, it signifies that the lsof command did not find any open files or connections associated with port 80.

set -o pipefail was set in the shell. Because the command lsof -i :80 return 1 to indicate there is no process using port 80, and the pipefail tells the shell to treat any failure in a pipeline as fatal (rather than only using the last command’s exit status).
There I was.

If running with set -o pipefail, a failure at any stage in a shell pipeline will cause the entire pipeline to be considered failed.

To turn this off for the remainder of the current script with set +o pipefail
To turn this back on for the remainder of the current script with set -o pipefail

The following is the partial code that needed the correction in the script:

...

          # Count the process using port 80
          set +o pipefail
          process_count=$(lsof -i :80 | grep LISTEN | wc -l)
          set -o pipefail
          
...