PG Error when trying to login

Solved. Solution in my reply

Since 1st of June every user who needs to re-authentificate will recieve the same error message saying

Could not authenticate you from Ldapmain because "Pg::checkviolation: error: no partition of relation “audit events” found for row detail: partition key of the failing row contains (created at) = (2022-06-08 08:17:41.621628). ".

Steps taken

I checked the database (table audit_events) and the last logged entry is from May 31st. I tried to restore from a backup from May 30th which did not fix the problem.
I worked through the applicable steps from the troubleshooting guide and the documentations.
A collegue suspects something happened with the change of month, like a validity running out.

System information:

  • GitLab: 13.2.10-ee (8cd5b450a0f) EE (Omnibus)
  • GitLab Shell: 13.3.0
  • PostgreSQL: 11.7
  • SLES12 SP4
  • LDAP Auth

Logs etc.

Condensed log

gitlab-ctl tail | tee --append /tmp/gitlab_tail.log

==> /var/log/gitlab/logrotate/current <==

==> /var/log/gitlab/logrotate/state <==

==> /var/log/gitlab/unicorn/current <==
2021-11-02_14:01:41.97985 Waiting for old unicorn to die…
2021-11-02_14:01:42.98125 Waiting for old unicorn to die…
2021-11-02_14:01:44.99291 wrapper for unicorn master 6622 exiting
2021-11-02_14:01:45.35092 adopted existing unicorn master 7479
2021-11-02_16:17:18.86483 Received TERM from runit, sending to process group (-PID)
2021-11-02_16:17:19.15237 forwarding TERM to unicorn master 7479
2021-11-02_16:17:19.17730 Waiting for unicorn to die…
2021-11-02_16:17:20.18599 forwarding CONT to unicorn master 7479
2021-11-02_16:17:20.18600 /opt/gitlab/embedded/bin/gitlab-unicorn-wrapper: line 132: kill: (7479) - No such process
2021-11-02_16:17:20.18726 wrapper for unicorn master 7479 exiting

==> /var/log/gitlab/unicorn/unicorn_stdout.log <==

==> /var/log/gitlab/unicorn/unicorn_stderr.log <==

==> /var/log/gitlab/unicorn/state <==

==> /var/log/gitlab/puma/puma_stdout.log <==
{“timestamp”:“2022-06-08T08:29:11.685Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2176.78515625 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:29:31.686Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2177.75 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:29:51.688Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2177.7421875 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:30:11.690Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2178.2109375 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:30:31.691Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2179.21875 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:30:51.693Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2179.671875 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:31:11.694Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2179.83984375 mb with master and 2 workers.”}
{“timestamp”:“2022-06-08T08:31:31.696Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2180.84765625 mb with master and 2 workers.”}
{“severity”:“ERROR”,“timestamp”:“2022-06-08T08:31:32.284Z”,“pid”:1854,“progname”:“omniauth”,“message”:"(ldapmain) Authentication failure! ldap_error: ActiveRecord::StatementInvalid, PG::CheckViolation: ERROR: no partition of relation “audit_events” found for row\nDETAIL: Partition key of the failing row contains (created_at) = (2022-06-08 08:31:32.274532).\n"}
{“timestamp”:“2022-06-08T08:31:51.697Z”,“pid”:1470,“message”:“PumaWorkerKiller: Consuming 2181.9140625 mb with master and 2 workers.”}

==> /var/log/gitlab/puma/current <==
2022-06-08_07:42:50.28435 {“timestamp”:“2022-06-08T07:42:50.283Z”,“pid”:1470,“message”:"* Environment: production"}
2022-06-08_07:42:50.28435 {“timestamp”:“2022-06-08T07:42:50.283Z”,“pid”:1470,“message”:"* Process workers: 2"}
2022-06-08_07:42:50.28436 {“timestamp”:“2022-06-08T07:42:50.283Z”,“pid”:1470,“message”:"* Preloading application"}
2022-06-08_07:45:11.10374 {“timestamp”:“2022-06-08T07:45:11.103Z”,“pid”:1470,“message”:"* Listening on unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket"}
2022-06-08_07:45:11.10410 {“timestamp”:“2022-06-08T07:45:11.103Z”,“pid”:1470,“message”:"* Listening on tcp://127.0.0.1:8080"}
2022-06-08_07:45:11.10419 {“timestamp”:“2022-06-08T07:45:11.104Z”,“pid”:1470,“message”:"! WARNING: Detected 3 Thread(s) started in app boot:"}
2022-06-08_07:45:11.10437 {“timestamp”:“2022-06-08T07:45:11.104Z”,“pid”:1470,“message”:"! #\u003cThread:0x00007fdafdfb1ff0@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:38 sleep_forever\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:40:in pop'"} 2022-06-08_07:45:11.10461 {"timestamp":"2022-06-08T07:45:11.104Z","pid":1470,"message":"! #\u003cThread:0x00007fdaef201828@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activerecord-6.0.3.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:334 sleep\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activerecord-6.0.3.1/lib/active_record/connection_adapters/abstract/connection_pool.rb:337:in sleep’"}
2022-06-08_07:45:11.10477 {“timestamp”:“2022-06-08T07:45:11.104Z”,“pid”:1470,“message”:"! #\u003cThread:0x00007fdadd524a30@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/rack-timeout-0.5.2/lib/rack/timeout/support/scheduler.rb:73 sleep\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/rack-timeout-0.5.2/lib/rack/timeout/support/scheduler.rb:91:in `sleep’"}
2022-06-08_07:45:11.10496 {“timestamp”:“2022-06-08T07:45:11.104Z”,“pid”:1470,“message”:“Use Ctrl-C to stop”}

==> /var/log/gitlab/puma/puma_stderr.log <==
from /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/bundler-1.17.3/exe/bundle:22:in <top (required)>' from /opt/gitlab/embedded/bin/bundle:23:in load’
from /opt/gitlab/embedded/bin/bundle:23:in `’
=== puma startup: 2022-06-08 09:45:11 +0200 ===
WARNING: Active Record does not support composite primary key.

audit_events has composite primary key. Composite primary key is ignored.
WARNING: Active Record does not support composite primary key.

audit_events has composite primary key. Composite primary key is ignored.

==> /var/log/gitlab/puma/state <==

==> /var/log/gitlab/grafana/current <==
2022-06-08_07:42:44.67699 t=2022-06-08T09:42:44+0200 lvl=info msg=“Path Home” logger=settings path=/var/opt/gitlab/grafana
2022-06-08_07:42:44.67699 t=2022-06-08T09:42:44+0200 lvl=info msg=“Path Data” logger=settings path=/var/opt/gitlab/grafana/data
2022-06-08_07:42:44.67700 t=2022-06-08T09:42:44+0200 lvl=info msg=“Path Logs” logger=settings path=/var/log/gitlab/grafana
2022-06-08_07:42:44.67700 t=2022-06-08T09:42:44+0200 lvl=info msg=“Path Plugins” logger=settings path=/var/opt/gitlab/grafana/data/plugins
2022-06-08_07:42:44.67700 t=2022-06-08T09:42:44+0200 lvl=info msg=“Path Provisioning” logger=settings path=/var/opt/gitlab/grafana/provisioning
2022-06-08_07:42:44.67701 t=2022-06-08T09:42:44+0200 lvl=info msg=“App mode production” logger=settings
2022-06-08_07:42:44.67701 t=2022-06-08T09:42:44+0200 lvl=info msg=“Connecting to DB” logger=sqlstore dbtype=sqlite3
2022-06-08_07:42:44.67701 t=2022-06-08T09:42:44+0200 lvl=info msg=“Starting DB migration” logger=migrator
2022-06-08_07:42:45.25951 t=2022-06-08T09:42:45+0200 lvl=info msg=“Starting plugin search” logger=plugins
2022-06-08_07:42:47.04405 t=2022-06-08T09:42:47+0200 lvl=info msg=“HTTP Server Listen” logger=http.server address=127.0.0.1:3000 protocol=http subUrl=/-/grafana socket=

==> /var/log/gitlab/grafana/state <==

==> /var/log/gitlab/redis/current <==
2022-06-08_08:22:49.06010 1490:M 08 Jun 2022 10:22:49.058 * 10 changes in 300 seconds. Saving…
2022-06-08_08:22:49.06016 1490:M 08 Jun 2022 10:22:49.059 * Background saving started by pid 4487
2022-06-08_08:22:49.08599 4487:C 08 Jun 2022 10:22:49.085 * DB saved on disk
2022-06-08_08:22:49.08685 4487:C 08 Jun 2022 10:22:49.086 * RDB: 6 MB of memory used by copy-on-write
2022-06-08_08:22:49.16191 1490:M 08 Jun 2022 10:22:49.161 * Background saving terminated with success
2022-06-08_08:27:50.07615 1490:M 08 Jun 2022 10:27:50.076 * 10 changes in 300 seconds. Saving…
2022-06-08_08:27:50.07764 1490:M 08 Jun 2022 10:27:50.077 * Background saving started by pid 4754
2022-06-08_08:27:50.10384 4754:C 08 Jun 2022 10:27:50.103 * DB saved on disk
2022-06-08_08:27:50.10521 4754:C 08 Jun 2022 10:27:50.105 * RDB: 7 MB of memory used by copy-on-write
2022-06-08_08:27:50.18226 1490:M 08 Jun 2022 10:27:50.179 * Background saving terminated with success

==> /var/log/gitlab/redis/state <==

==> /var/log/gitlab/puma/puma_stdout.log <==
{“severity”:“ERROR”,“timestamp”:“2022-06-08T08:31:59.808Z”,“pid”:1854,“progname”:“omniauth”,“message”:"(ldapmain) Authentication failure! ldap_error: ActiveRecord::StatementInvalid, PG::CheckViolation: ERROR: no partition of relation “audit_events” found for row\nDETAIL: Partition key of the failing row contains (created_at) = (2022-06-08 08:31:59.79614).\n"}

Stacktrace (production_json.log)

“app/services/audit_event_service.rb:84:in log_security_event_to_database'", "app/services/audit_event_service.rb:43:in security_event’”,
“ee/app/services/ee/audit_event_service.rb:122:in security_event'", "app/controllers/omniauth_callbacks_controller.rb:228:in log_audit_event’”,
“app/controllers/omniauth_callbacks_controller.rb:155:in sign_in_user_flow'", "app/controllers/ldap/omniauth_callbacks_controller.rb:23:in ldap’”,
“ee/lib/gitlab/ip_address_state.rb:10:in with'", "ee/app/controllers/ee/application_controller.rb:44:in set_current_ip_address’”,
“app/controllers/application_controller.rb:491:in set_current_admin'", "lib/gitlab/session.rb:11:in with_session’”,
“app/controllers/application_controller.rb:482:in set_session_storage'", "app/controllers/application_controller.rb:476:in set_locale’”,
“lib/gitlab/error_tracking.rb:50:in with_context'", "app/controllers/application_controller.rb:541:in sentry_context’”,
“app/controllers/application_controller.rb:469:in block in set_current_context'", "lib/gitlab/application_context.rb:52:in block in use’”,
“lib/gitlab/application_context.rb:52:in use'", "lib/gitlab/application_context.rb:20:in with_context’”,
“app/controllers/application_controller.rb:462:in set_current_context'", "ee/lib/gitlab/jira/middleware.rb:19:in call’”

Thank you for taking your time reading through this post. If I can supply any more information in order to resolve this issue, please let me know.

I found the solution to the problem. Unlike initially suspected, the problem does not occure when reading from the DB, but when attempting to write. The poratiotion table for the current month was missing.

In case anyone in the future runs into the same problem, here is what I’ve done:
List the existing partitions for audit_events table:

SELECT
    nmsp_parent.nspname AS parent_schema,
    parent.relname      AS parent,
    nmsp_child.nspname  AS child_schema,
    child.relname       AS child
FROM pg_inherits
    JOIN pg_class parent            ON pg_inherits.inhparent = parent.oid
    JOIN pg_class child             ON pg_inherits.inhrelid   = child.oid
    JOIN pg_namespace nmsp_parent   ON nmsp_parent.oid  = parent.relnamespace
    JOIN pg_namespace nmsp_child    ON nmsp_child.oid   = child.relnamespace
WHERE parent.relname='audit_events';

Gave me this:

---------------+--------------+---------------------------+---------------------
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202205
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202204
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202203
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202202
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202201
 public        | audit_events | gitlab_partitions_dynamic | audit_events_202112

Here we see, there’s no _202206 partition.
I created the table using

CREATE TABLE 
    audit_events_202206 
PARTITION OF 
    audit_events 
FOR VALUES 
    FROM ('2022-06-01') 
    TO ('2022-06-30');

And now it works again.

What still is not resolved is the why this happened. I will check next month if the problem occurs again, if not, it’s probably save to assume a job failed.