Docker gitlab-ce: Upgrade to 16.7.7 started seeing 500 and 502 randomly

:hugs: Please help fill in this template with all the details to help others help you more efficiently. Use formatting blocks for code, config, logs and ensure to remove sensitive data.

Problem to solve

Describe your question in as much detail as possible:

I spent the last week upgrading our gitlab from 14.9.2 to 16.7.7. Waiting appropriately between migrations for background migrations to stop.
14.9.2 > 14.9.5 > 14.10.5 > 15.0.5 > 15.4.6 > 15.11.13 > 16.0.8 > > 16.3.7 > 16.7.7

Everything appeared to be going well however today the gitlab server keeps returning 500/502 errors seemingly at random. Refreshing the page seems to cycle randomly between working and the 500 page and the 502 page.

The gitlab health check returns healthy when the page loads.

  • What are you seeing, and how does that differ from what you expect to see?
  • Consider including screenshots, error messages, and/or other helpful visuals

Steps to reproduce

Just refresh the page several times

Configuration

Provide screenshots from the GitLab UI showing relevant configuration, if applicable.
On self-managed instances, add the relevant configuration settings or changes.

Running docker gitlab

$ cat docker-compose.yaml
version: "3"

services:
  web:
    image: 'gitlab/gitlab-ce:16.7.7-ce.0'
    restart: always
    environment:
      GITLAB_OMNIBUS_CONFIG: |
        external_url 'https://gitlab.syr.criticallink.com'
        gitlab_rails['gitlab_shell_ssh_port'] = 2224
    ports:
      - '10.0.0.62:80:80'
      - '10.0.0.62:443:443'
      - '10.0.0.62:2224:22'
    volumes:
      - './config:/etc/gitlab'
      - './logs:/var/log/gitlab'
      - './data:/var/opt/gitlab'
      - '/etc/ssl/certs:/etc/ssl/certs:ro'
      - '/etc/ssl/private:/etc/ssl/private:ro'

Versions

Please select whether options apply, and add the version information.

Versions

$ docker compose exec web gitlab-rake gitlab:env:info

System information
System:
Current User:   git
Using RVM:      no
Ruby Version:   3.1.4p223
Gem Version:    3.4.22
Bundler Version:2.4.22
Rake Version:   13.0.6
Redis Version:  7.0.15
Sidekiq Version:6.5.12
Go Version:     unknown

GitLab information
Version:        16.7.7
Revision:       5fb02de437c
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     13.13
URL:            https://gitlab.syr.criticallink.com
HTTP Clone URL: https://gitlab.syr.criticallink.com/some-group/some-project.git
SSH Clone URL:  ssh://git@gitlab.syr.criticallink.com:2224/some-group/some-project.git
Using LDAP:     no
Using Omniauth: yes
Omniauth Providers: google_oauth2

GitLab Shell
Version:        14.32.0
Repository storages:
- default:      unix:/var/opt/gitlab/gitaly/gitaly.socket
GitLab Shell path:              /opt/gitlab/embedded/service/gitlab-shell

Gitaly
- default Address:      unix:/var/opt/gitlab/gitaly/gitaly.socket
- default Version:      16.7.7
- default Git Version:  2.42.0

Upgraded to 16.10.1 and the server has been stable for about 3 hours now…

Started happening again this morning. Looks like it was caused by the default docker shm_size. I noticed this was increased in the newer docker compose examples. Its stable so far.