Migrating projects/snippets/groups to Gitaly cluster failing

Hello all,

I have an installation of Gitlab CE 16.8.1 with local storage and i converted this to Gitaly cluster + Praefect.

Architecture is like this:
1 prefect node + 1 pgsql (no pgbouncer, 2 DB configs for praefect, normal and session pooled), sql-ping is OK
3 gitaly nodes, can communicate with praefect and the other way around, all hook scripts return OK.
gitlab_shell token is the same on ALL servers, including Rails.

This is the config from rails: (moved local storage from path=> to local gitaly)

git_data_dirs({
      "default" => {
          "gitaly_address" => "tcp://localhost:8075" ,
          "gitaly_token" => "xxx"
      },
      "praefect" => {
           "gitaly_address" => "tcp://10.240.0.177:2305",
           "gitaly_token" => "xxx"
      }
})

localhost is the local gitaly services pointing to the old /repos as path, and praefect is the praefect endpoint for gitaly cluster. All checks run fine from rails

gitlab_rails['gitaly_token']

from rails has the token from praefect “external token” which is used by clients, i double checked, it’s identical.

On prafect node:

praefect['configuration'] = {
   listen_addr: '0.0.0.0:2305',
   prometheus_listen_addr: '0.0.0.0:9652',
   prometheus_exclude_database_from_default_metrics: true,
#   tls_listen_addr: 'localhost:3305',
   auth: {
     token: 'xxx',
#     transitioning: false,
   },
   logging: {
     format: 'json',
     level: 'debug',
   },

   database: {
     host: '10.240.0.176',
     port: 5432,
     user: 'praefect',
     password: 'xxx',
     dbname: 'praefect_production',
     sslmode: 'disable',
     session_pooled: {
       host: '10.240.0.176',
       port: 5432,
       user: 'praefect',
       password: 'xxx',
       dbname: 'praefect_production',
       sslmode: 'disable'
       },
   },
   virtual_storage: [
     {
       name: 'praefect',
       node: [
         {
           storage: 'gitaly-1',
           address: 'tcp://10.240.0.196:8075',
           token: 'yyy',
         },
         {
           storage: 'gitaly-2',
           address: 'tcp://10.240.0.62:8075',
           token: 'yyy',
         },
         {
           storage: 'gitaly-3',
           address: 'tcp://10.240.0.36:8075',
           token: 'yyy',
         },

       ],
   },
  ],

Gitaly servers have gitaly-1/2/3 names and all 3 are configured on each server. (as per documentation)

Setting the weight to ‘0’ on default storage in Gitlab UI and weight ‘100’ for praefect storage works OK when creating a new repo with initliazing README, the project ends up in the cluster, i can see also in logs.

The only thing that does not work is migrating ANYTHING using API, whether it is groups projects or snippets. The error i get starts from gitaly logs on the cluster and propagates all the way to rails logs (sidekiq logs). I can see the same on praefect logs on that node so the error is generated by Gitaly daemon.

below are the logs from gitaly-2 server which was chosen by praefect to host this repository when migrated:

  "component": "gitaly.UnaryServerInterceptor",
  "correlation_id": "01HNE9G4WQSG5WJMAWBB3KTJ74",
  "diskcache": "cc51c1c8-9de3-46a8-abda-b4f34468466d",
  "grpc.meta.auth_version": "v2",
  "grpc.meta.client_name": "gitlab-sidekiq",
  "grpc.meta.method_operation": "mutator",
  "grpc.meta.method_scope": "repository",
  "grpc.meta.method_type": "unary",
  "grpc.method": "ReplicateRepository",
  "grpc.request.deadline": "2024-01-31T04:58:14.341",
  "grpc.request.fullMethod": "/gitaly.RepositoryService/ReplicateRepository",
  "grpc.request.glProjectPath": "test-projects/paperless-ngx",
  "grpc.request.glRepository": "project-49",
  "grpc.request.repoPath": "@cluster/repositories/df/e6/227",
  "grpc.request.repoStorage": "gitaly-2",
  "grpc.service": "gitaly.RepositoryService",
  "grpc.start_time": "2024-01-30T22:58:14.349",
  "level": "info",
  "msg": "diskcache state change",
  "pid": 41987,
  "remote_ip": "10.240.0.164",
  "span.kind": "server",
  "system": "grpc",
  "time": "2024-01-30T22:58:15.082Z",
  "user_id": "2",
  "username": "xxx"
}
{
  "command.count": 1,
  "command.cpu_time_ms": 3,
  "command.inblock": 0,
  "command.majflt": 0,
  "command.maxrss": 318920,
  "command.minflt": 162,
  "command.oublock": 32,
  "command.real_time_ms": 4,
  "command.spawn_token_fork_ms": 0,
  "command.spawn_token_wait_ms": 0,
  "command.system_time_ms": 3,
  "command.user_time_ms": 0,
  "component": "gitaly.UnaryServerInterceptor",
  "correlation_id": "01HNE9G4WQSG5WJMAWBB3KTJ74",
  "error": "could not create repository from snapshot: creating repository: extracting snapshot: new client: could not dial source: rpc error: code = PermissionDenied desc = permission denied",
  "feature_flags": "atomic_fetch_remote exec_command_directly_in_cgroup intercept_replicate_repository mailmap_options upload_pack_boundary_bitmap_traversal use_resizable_semaphore_in_concurrency_limiter",
  "grpc.code": "Canceled",
  "grpc.meta.auth_version": "v2",
  "grpc.meta.client_name": "gitlab-sidekiq",
  "grpc.meta.method_operation": "mutator",
  "grpc.meta.method_scope": "repository",
  "grpc.meta.method_type": "unary",
  "grpc.method": "ReplicateRepository",
  "grpc.request.deadline": "2024-01-31T04:58:14.341",
  "grpc.request.fullMethod": "/gitaly.RepositoryService/ReplicateRepository",
  "grpc.request.glProjectPath": "test-projects/paperless-ngx",
  "grpc.request.glRepository": "project-49",
  "grpc.request.payload_bytes": 223,
  "grpc.request.repoPath": "@cluster/repositories/df/e6/227",
  "grpc.request.repoStorage": "gitaly-2",
  "grpc.response.payload_bytes": 0,
  "grpc.service": "gitaly.RepositoryService",
  "grpc.start_time": "2024-01-30T22:58:14.349",
  "grpc.time_ms": 733.633,
  "level": "info",
  "limit.concurrency_in_progress": 1,
  "limit.concurrency_queue_length": 0,
  "limit.concurrency_queue_ms": 0,
  "limit.limiting_key": "@cluster/repositories/df/e6/227",
  "limit.limiting_type": "per-rpc",
  "msg": "finished unary call with code Canceled",
  "pid": 41987,
  "remote_ip": "10.240.0.164",
  "span.kind": "server",
  "system": "grpc",
  "time": "2024-01-30T22:58:15.083Z",
  "user_id": "2",
  "username": "xxx"
}

So this is what i get below, no matter the move request via API:
“error”: “could not create repository from snapshot: creating repository: extracting snapshot: new client: could not dial source: rpc error: code = PermissionDenied desc = permission denied”,

I checked the gitaly local folders , all are owned by user git group git, chmod 0775 -R on those 3 servers.

All versions are 16.8.1. All ports can be accessed from gitaly cluster to praefect and vice-versa, rails can access praefect, all ports are open, this is a private network with no firewall.

Is this the intended error because I don’t have Premium plan for moving groups, as projects are group dependent ? And as far as i saw, moving groups via API requires a premium plan, or this is another problem ? Who is gitaly server trying to dial out and it gets ‘permission denied’ ?

I wasn’t able to see that information even when i enabled GODEBUG http2debug = 2 to ‘env’ .

What’s the best way in this situation to move everything to praefect storage ?

Thank you

permission denied
as a test can you either / temporarily
1 - try chmod 777 instead of 775
2 - do chown -R git: /directory to make sure it’s all git, just in case

I have tricked Rails and setup both default and praefect data-dirs to point to the praefect node, and restored from backup. Now all repos are on the cluster, so permissions are OK. Either way, if permissions were not ok, gitaly would not start and complain the logs (ask me how i know), so that is ruled out and i did from the start ‘chown -R git:git /opt/gitaly’

Creating new repos land perfectly on the cluster, just moving via API fails with Permission Denied at DIAL SOURCE which doesn’t give much details on which the source is, etc.

Glad to hear you managed to trick it :slight_smile:
Still getting permission errors via API? some sorcery going on there! it must run as a different user?
Is there some kind of a service maybe that runs / starts up as a different user?

No permission errors after i tricked it. All working normally. No different user, all gitlab components are running with ‘git’ user from the start. I left ‘default’ storge as it’s need as per docs, and also i left ‘praefect’ storage with initially was configured to take over. Now they are both pointing to the praefect node.

No idea if this is a “best practice” but since migrating did not work, my assumption is that Group migration needs subscription, and projects cannot be migrated without groups as they are linked to those, hence the error, but i am not sure that this is the case here. I expected to get an error like “unable to migrate, subscription to EE required” not Permission denied, but i might be wrong and the error is elsewhere.