Problem Gitlab HA active-active multiple app server

Hi,
I am new to GITLAB and I have a problem with a web app server node of the solution multiple application servers HA active - active.
(https://docs.gitlab.com/ce/administration/high_availability/README.html#active-active)

I installed 1 independent Postgres and Redis server, besides an NFS server. The master node working fine, but when started the second node the first node have a problem with Its interface showing “Whoops, GitLab is taking too much time to respond.” Error 502, but the second node is working fine. When restarted the first node (or master node) the second node showing “Whoops, GitLab is taking too much time to respond.” Error 502 and the first node is working fine again.

Node ACTIVE:

[root@gitlab01 gitlab]# gitlab-ctl status
run: gitaly: (pid 26959) 11288s; run: log: (pid 26958) 11288s
run: gitlab-monitor: (pid 26971) 11288s; run: log: (pid 26970) 11288s
run: gitlab-workhorse: (pid 26961) 11288s; run: log: (pid 26960) 11288s
run: logrotate: (pid 6049) 487s; run: log: (pid 26964) 11288s
run: nginx: (pid 26963) 11288s; run: log: (pid 26962) 11288s
run: node-exporter: (pid 26969) 11288s; run: log: (pid 26968) 11288s
down: prometheus: 1s, normally up, want up; run: log: (pid 26966) 11288s
run: sidekiq: (pid 26957) 11288s; run: log: (pid 26956) 11288s
run: unicorn: (pid 26955) 11288s; run: log: (pid 26954) 11288s
[root@gitlab01 gitlab]#

[root@gitlab01 gitlab]# tail -f /var/log/gitlab/prometheus/current
2017-05-02_19:46:22.20326 time=“2017-05-02T16:46:22-03:00” level=info msg=“Starting prometheus (version=, branch=, revision=)” source=“main.go:75”
2017-05-02_19:46:22.20327 time=“2017-05-02T16:46:22-03:00” level=info msg=“Build context (go=go1.5.4, user=, date=)” source=“main.go:76”
2017-05-02_19:46:22.20948 time=“2017-05-02T16:46:22-03:00” level=info msg=“Loading configuration file /var/opt/gitlab/prometheus/prometheus.yml” source=“main.go:248”
2017-05-02_19:46:22.32445 time=“2017-05-02T16:46:22-03:00” level=error msg=“Could not lock /var/opt/gitlab/prometheus/data/DIRTY, Prometheus already running?” source=“persistence.go:198”
2017-05-02_19:46:22.32446 time=“2017-05-02T16:46:22-03:00” level=error msg=“Error opening memory series storage: resource temporarily unavailable” source=“main.go:182”

NODE “Whoops, GitLab is taking too much time to respond. error 502”

[root@gitlab02 ~]# gitlab-ctl status
run: gitaly: (pid 29449) 11288s; run: log: (pid 29448) 11288s
run: gitlab-monitor: (pid 29465) 11288s; run: log: (pid 29464) 11288s
run: gitlab-workhorse: (pid 29451) 11288s; run: log: (pid 29450) 11288s
run: logrotate: (pid 20159) 487s; run: log: (pid 29454) 11288s
run: nginx: (pid 29453) 11288s; run: log: (pid 29452) 11288s
run: node-exporter: (pid 29463) 11288s; run: log: (pid 29462) 11288s
run: prometheus: (pid 29461) 11288s; run: log: (pid 29460) 11288s
run: sidekiq: (pid 29447) 11288s; run: log: (pid 29446) 11288s
run: unicorn: (pid 29445) 11288s; run: log: (pid 29444) 11288s

[root@gitlab02 ~]# tail -f /var/log/gitlab/prometheus/current
2017-05-02_19:25:58.57040 time=“2017-05-02T16:25:58-03:00” level=info msg=“Checkpointing in-memory metrics and chunks…” source=“persistence.go:612”
2017-05-02_19:25:58.96309 time=“2017-05-02T16:25:58-03:00” level=info msg=“Done checkpointing in-memory metrics and chunks in 392.67882ms.” source=“persistence.go:639”

[root@gitlab02 logs]# tail -f gitlab_error.log
2017/05/02 16:44:00 [error] 29507#0: *26 connect() to unix:/var/opt/gitlab/gitlab-workhorse/socket failed (111: Connection refused) while connecting to upstream, client: 167.28.133.27, server: gitlab.domain, request: “GET / HTTP/1.1”, upstream: “http://unix:/var/opt/gitlab/gitlab-workhorse/socket:/”, host: “167.28.191.3”

gitlab01.rb
gitlab02.rb

please can help me find the possible cause.
I will be attentive to your comments.

Best regards

Hello @rretamales, I’m having stricly the same issue. Have you been able to go through it?

Thank you

Ok, found, I haven’t seen the precise documentation page here https://docs.gitlab.com/ee/administration/high_availability/gitlab.html, specifying which folders to share, I was sharing too many folders.

@florian.thoni Do you have problem skidekiq busy job 25 of 25 ? i had that problem in 2 application server.