GitLab Agent for Kubernetes GRPC error

Hi,

We are using Gitlab Agent to deploy on our Kube clusters with the help of HELM.

We have this kind of error who are often popping in our CI/CD jobs :

Error: UPGRADE FAILED: release backend failed, and has been rolled back due to atomic being set: failed to refresh resource information: GitLab Agent Server: HTTP->gRPC: failed to read gRPC response: rpc error: code = Canceled desc = context canceled

Jobs are taking 3 to 5 minutes to deploy but this error is really random and is happening on multiple Kube clusters.

On the agent side we have errors who look like the one returned by the job

{“level”:“error”,“time”:“2022-11-22T10:04:13.396Z”,“msg”:“Error handling a connection”,“mod_name”:“reverse_tunnel”,“error”:“rpc error: code = Unavailable desc = error reading from server: failed to get reader: failed to read frame header: EOF”,“agent_id”:15764}

But they are never correlated to the ones we have in the CI.

It’s really annoying because when this is happening deployment is rollbacked and app can end in an inconsistent state.

Do you know how we can prevent this ?

Arnaud.

We experience the same issues. Random connection issues which lead to failed/inconsistent helm deployments. @arnaud.beun.sorare did you found a solution in the meantime?

Hi @holger,

I am still facing the issue with this kind of error.

transport.go:2242: Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 400 Bad Request\nCache-Control: no-cache\nConnection: close\nContent-Type: text/html\n\n<!DOCTYPE html>\n<html>\n<head>\n  <meta content=\"width=device-width, initial-scale=1, maximum-scale=1\" name=\"viewport\">\n  <title>400 Bad Request</title>\n  <style>\n    body {\n      color: #666;\n      text-align: center;\n      font-family: \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n      margin: auto;\n      font-size: 14px;\n    }\n\n    h1 {\n      font-size: 56px;\n      line-height: 100px;\n      font-weight: normal;\n      color: #456;\n    }\n\n    h2 {\n      font-size: 24px;\n      color: #666;\n      line-height: 1.5em;\n    }\n\n    h3 {\n      color: #456;\n      font-size: 20px;\n      font-weight: normal;\n      line-height: 28px;\n    }\n\n    hr {\n      max-width: 800px;\n      margin: 18px auto;\n      border: 0;\n      border-top: 1px solid #EEE;\n      border-bottom: 1px solid white;\n    }\n\n    img {\n      max-width: 40vw;\n    }\n\n    .container {\n      margin: auto 20px;\n    }\n  </style>\n</head>\n\n<body>\n  <h1>\n    <img src=\"\" alt=\"GitLab Logo\" /><br />\n    400\n  </h1>\n  <div class=\"container\">\n    <h3>Your browser sent an invalid request.</h3>\n    <hr />\n    <p>Please contact your GitLab administrator if you think this is a mistake.</p>\n  </div>\n</body>\n</html>\n<html>\n"; err=<nil>
Error: UPGRADE FAILED: release backend failed, and has been rolled back due to atomic being set: GitLab Agent Server: HTTP->gRPC: failed to read gRPC response: rpc error: code = Canceled desc = context canceled

It’s happening every day and we are looking for another solution, using an external tool like ArgoCd or by calling the Kube api directly without using the Agent.

NB : I am using gitlab-agent version v15.5.1

Hi @arnaud.beun.sorare

I get this error in GitLab Job-Log:

Error: UPGRADE FAILED: could not get information about the resource: GitLab Agent Server: HTTP->gRPC: failed to read gRPC response: rpc error: code = Canceled desc = context canceled

And this is what I see in the GitLab Agent Log:

{"level":"error","time":"2022-12-14T02:30:54.900Z","msg":"Error handling a connection","mod_name":"reverse_tunnel","error":"rpc error: code = Unavailable desc = error reading from server: failed to get reader: failed to read frame header: EOF","agent_id":10664}
{"level":"warn","time":"2022-12-14T02:30:54.900Z","msg":"GetConfiguration.Recv failed","error":"rpc error: code = Unavailable desc = error reading from server: failed to get reader: failed to read frame header: EOF","agent_id":10664}

I’m not 100% sure if the GitLab Job errors and the Gitlab Agent errors are related (regarding the timestamps they are not).

We are using GitLab shared runners from the SaaS version of GitLab and connect to the Agent which is deployed to our Kubernetes Cluster.

Are you using shared runners as well or do you host dedicated runners?

Hi @holger,

No, I am using private runners.

I have tested to upgrade the agent at the latest version but it doesn’t change a thing.

Hi @arnaud.beun.sorare
It is most likely the problems are related to this issue: CI deployment error related to GitLab Agent (#279) · Issues · GitLab.org / cluster-integration / GitLab Agent for Kubernetes · GitLab
A fix is on the way…

Thank you @holger for the update.

We are really struggling with it. I hope the will fix it soon.