Huge amount of requests in logs after 16.5 update

Hi,

After 16.5 update, we can see a lot of requests that are generating lot of logs in production_json.log and graphql_json.log and huge load for the server.

I don’t what is the source of this and also, this is not happening for all users.
Also, that issue number is not the only one, but, again, might be related to a specific project.

What I was able to find until now is that is a possibility that this behavior can be triggered by inactive browser tabs and might be related to specific projects.

Does anyone have any idea what is the reason for this?

{
  "method": null,
  "path": null,
  "format": null,
  "params": [
    {
      "key": "query",
      "value": "subscription issuableAssigneesUpdated($issuableId: IssuableID!) {\n  issuableAssigneesUpdated(issuableId: $issuableId) {\n    ... on Issue {\n      id\n      assignees {\n        nodes {\n          ...User\n          status {\n            availability\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment User on User {\n  id\n  avatarUrl\n  name\n  username\n  webUrl\n  __typename\n}\n"
    },
    {
      "key": "variables",
      "value": {
        "issuableId": "gid://gitlab/Issue/4513"
      }
    },
    {
      "key": "operationName",
      "value": "issuableAssigneesUpdated"
    },
    {
      "key": "nonce",
      "value": "74425feb-8d74-4c39-ac56-d7a52e889e23"
    }
  ],
  "controller": "GraphqlChannel",
  "action": "subscribe",
  "status": 200,
  "time": "2023-10-25T12:29:17.921Z",
  "remote_ip": "10.x.x.x",
  "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
  "db_count": 0,
  "db_write_count": 0,
  "db_cached_count": 0,
  "db_replica_count": 0,
  "db_primary_count": 0,
  "db_main_count": 0,
  "db_ci_count": 0,
  "db_main_replica_count": 0,
  "db_ci_replica_count": 0,
  "db_replica_cached_count": 0,
  "db_primary_cached_count": 0,
  "db_main_cached_count": 0,
  "db_ci_cached_count": 0,
  "db_main_replica_cached_count": 0,
  "db_ci_replica_cached_count": 0,
  "db_replica_wal_count": 0,
  "db_primary_wal_count": 0,
  "db_main_wal_count": 0,
  "db_ci_wal_count": 0,
  "db_main_replica_wal_count": 0,
  "db_ci_replica_wal_count": 0,
  "db_replica_wal_cached_count": 0,
  "db_primary_wal_cached_count": 0,
  "db_main_wal_cached_count": 0,
  "db_ci_wal_cached_count": 0,
  "db_main_replica_wal_cached_count": 0,
  "db_ci_replica_wal_cached_count": 0,
  "db_replica_duration_s": 0,
  "db_primary_duration_s": 0,
  "db_main_duration_s": 0,
  "db_ci_duration_s": 0,
  "db_main_replica_duration_s": 0,
  "db_ci_replica_duration_s": 0,
  "pid": 482195,
  "worker_id": "puma_1",
  "rate_limiting_gates": [],
  "correlation_id": "01HDKC38CX93F5VP9Y2J79JSZX",
  "duration_s": 0.01571
}
{
  "method": null,
  "path": null,
  "format": null,
  "params": [
    {
      "key": "query",
      "value": "subscription issuableAssigneesUpdated($issuableId: IssuableID!) {\n  issuableAssigneesUpdated(issuableId: $issuableId) {\n    ... on Issue {\n      id\n      assignees {\n        nodes {\n          ...User\n          status {\n            availability\n            __typename\n          }\n          __typename\n        }\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment User on User {\n  id\n  avatarUrl\n  name\n  username\n  webUrl\n  __typename\n}\n"
    },
    {
      "key": "variables",
      "value": {
        "issuableId": "gid://gitlab/Issue/4513"
      }
    },
    {
      "key": "operationName",
      "value": "issuableAssigneesUpdated"
    },
    {
      "key": "nonce",
      "value": "6d6fba50-fa79-46dd-9950-66d8e4ad43e2"
    }
  ],
  "controller": "GraphqlChannel",
  "action": "unsubscribe",
  "status": 200,
  "time": "2023-10-25T12:29:17.918Z",
  "remote_ip": "10.x.x.x",
  "ua": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",
  "db_count": 0,
  "db_write_count": 0,
  "db_cached_count": 0,
  "db_replica_count": 0,
  "db_primary_count": 0,
  "db_main_count": 0,
  "db_ci_count": 0,
  "db_main_replica_count": 0,
  "db_ci_replica_count": 0,
  "db_replica_cached_count": 0,
  "db_primary_cached_count": 0,
  "db_main_cached_count": 0,
  "db_ci_cached_count": 0,
  "db_main_replica_cached_count": 0,
  "db_ci_replica_cached_count": 0,
  "db_replica_wal_count": 0,
  "db_primary_wal_count": 0,
  "db_main_wal_count": 0,
  "db_ci_wal_count": 0,
  "db_main_replica_wal_count": 0,
  "db_ci_replica_wal_count": 0,
  "db_replica_wal_cached_count": 0,
  "db_primary_wal_cached_count": 0,
  "db_main_wal_cached_count": 0,
  "db_ci_wal_cached_count": 0,
  "db_main_replica_wal_cached_count": 0,
  "db_ci_replica_wal_cached_count": 0,
  "db_replica_duration_s": 0,
  "db_primary_duration_s": 0,
  "db_main_duration_s": 0,
  "db_ci_duration_s": 0,
  "db_main_replica_duration_s": 0,
  "db_ci_replica_duration_s": 0,
  "pid": 482195,
  "worker_id": "puma_1",
  "rate_limiting_gates": [],
  "correlation_id": "01HDKC38CX93F5VP9Y2J79JSZX",
  "duration_s": 0.00021
}

Hi @AdyR9 ,

Have you found any cause for this? We upgraded our Gitlab server to 16.5 last week and are now experiencing the same issue. Our production_json.log is getting up to 100 GB in a day.

Hi @markdbrown

I didn’t found the reason or a fix, but just a workaround.
This occurs when gitlab tabs (probably tabs with pipeline pages or other with autorefresh) stays opened in browser and the user reconnect to vpn or wake their laptop, then requests start to be sent to the server.
So, to avoid this, ask people to close their tabs or refresh them.

Also, pipeline page has issues: open a project with pipelines and go to the pipeline page, open a new tab with whatever, go back to the pipelines tab and a autorefresh will occur. this will generate a and pipelines won’t be displayed (response code 429). also, you can’t click on any stages status. this doesn’t occur if cache is disabled for the tab, meaning there is a issue how cahce is being used/send to server. workaround for this is to enable developer mode in tab and disable cache from network section.

hope this will be fixed or at least someone can give more details why this occurs

I would suggest opening an issue here so that the Devs can be made aware of the problem and potentially fix it: Issues · GitLab.org / GitLab · GitLab

You may wish to search the issues first to make sure someone didn’t open already. Devs don’t address issues mentioned on the forum since they will only work on issues that have been opened.

When you open the issue, it would be good to link this thread in the issue, so that they can see people are experiencing this issue, and for people reading this, to up-vote the issue so that they can see how many people are affected. Also link the issue you open in this thread so that people can find it easily to up-vote.

We are experiencing the same issue. production_json.log and graphql_json.log are filling up so much space so quickly.

I created an issue to hopefully get this looked at: Huge amount of requests in logs after 16.5 update (#430877) · Issues · GitLab.org / GitLab · GitLab

2 Likes

Thanks @iwalker for suggestion and @markdbrown for opening the issue

1 Like

Hi @AdyR9 ,

Looks like I’m facing the same problem, yesterday Gitlab 16.3.5 was updated to 16.5.1. The update went without errors, there were no background migrations during the updates. But
I can’t say by the size of these log files, in a couple of hours they were about 2 GB. But the total load on the CPU has increased to 80-85%. Within 2 hours the MergeRequestCleanupRefsWorker jobs are running. Stopping the jobs helps for about one minute, after which the jobs resume.

As a result, in 2 hours in the history of background jobs, more than 70 thousand jobs were processed. And this is 2.5 times more per day on our server.

Can you tell me if you have a MergeRequestCleanupRefsWorker jobs in the background jobs?

I found my problem in the next topic (GitLab CPU usage by sidekiq over 50%). I would like to know if you have something similar?

In earlier versions the CPU usage was around 15%

GitLab CE version 16.5.1
Centos 7
4 cpu cores, 16 gb ram

I had to make an urgent rollback to the previous version.

We are also experiencing the issue. Running 16.5.

In our case it happens when we have a specific issue page opened in our gitlab instance. The page has some uploaded files and images. We can see in the access logs and the production_json log that gitlab returns status 304 for these “automatic” requests:
{“method”:“GET”,“path”:“/project-abc/-/issues/1342/realtime_changes”,“format”:“json”,“controller”:null,“action”:null,“status”:304,“time”:“2023-11-14T08:34:34.299Z”

For me it happend after update from version 16.5.0 to version 16.5.1 so the reason must be somewhere beetween those versions

Anyone know if this is still happening in version 16.5.2?

Hi @czqrny, @Alex23 – this issue should have been resolved in 16.5.1, so we’re trying to understand why you’re still seing it in that version. As this was ultimately caused by a frontend dependency, it would be possibly to still happen from tabs that had been opened before 16.5.1 and were still open. Any chance that is the case for you? If so, the tabs should ideally be hard-refreshed (usually Shift + Refresh).

If not, can you double check once more that you are still experiencing it in 16.5.1 and that the logs are filling up with the same type of message that is mentioned in the original post?

Hi @manuelgrabowski,

In my case the problem has gone after ~2 days (graph below, disk usage (GB) in time). So at 15.11.2023 morning I updated my production instance from version 16.5.0 to 16.5.1. During first day of new version working the disk occupancy changed from ~10 GB to ~120 GB (I deleted journal and GitLab logs manually). What is worth to mention that GitLab caused half of this logging, second half was copied to linux journal because my instance works in Docker container configured as systemd service.

My second problem was that GitLab logs rotation did not work because logrotate configuration file had wrong owner (git user not root) which I also fixed now because of this “bug”.

It seems that after ~2 days the problem has gone, so maybe all of my users refreshed their tabs?

That doesn’t explain why It happend to me on version 16.5.1 not 16.5.0 (which I was using for 3 weeks before udpate to 16.5.1).

Docker image on which my GitLab instance is working is: gitlab/gitlab-ce:16.5.1-ce.0 (ID: sha256:2356d4f2a386a59fbf588c2f7611b298ebf42b65e2af41ca5b1084565d8c7550)

Gotcha, thanks. Glad to hear you’re at least not suffering from it anymore. I wonder if not being affected by it in 16.5.0 could also be due to caching somehow, but seems unlikely that for three weeks after the update none of the users would have received the new frontend code at some point.

1 Like