Speeding up gitlab-rake startup gitlab:doctor:secrets

Problem to solve

Good day! I’d like to suggest considering a possible improvement to the encrypted data decryption check in the p_ci_builds table.
I have Gitlab in production. Every time I update, I run the decryption check with the command:
gitlab-rake gitlab:doctor:secrets
The ci:builds check takes the longest – about 6 hours, as the table contains 2.3 million+ records. While discussing the issue with the AI, it became clear that the check checks every record in the table, regardless of whether it contains an encrypted element. Together with the AI, we wrote a script that checks only rows containing encrypted elements, and the check time was reduced from 6 hours to 10 minutes.

I’d like to ask you to review this script to determine whether it’s correct for the check.

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'logger'
require 'rainbow'

# Initialize logger
logger = Logger.new(STDOUT)
logger.level = Logger::INFO

# Parse command line arguments
options = {}
OptionParser.new do |opts|
  opts.banner = "Usage: check_ci_builds_secrets.rb [options]"
  opts.on("-n N", "--last N", Integer, "Check only the last N records") do |n|
    options[:last] = n
  end
  opts.on("-h", "--help", "Show this help message") do
    puts opts
    exit
  end
end.parse!

# Check attribute decryption correctness
def valid_attribute?(data, attr, logger)
  data.send(attr)
  true
rescue OpenSSL::Cipher::CipherError, TypeError
  logger.debug Rainbow("    ❌ Failed to decrypt #{attr} for record with ID: #{data.id}").red
  false
rescue StandardError => e
  logger.debug Rainbow("    ❌ Error for #{attr} in record with ID: #{data.id}: #{e}").red
  false
end

# Main check logic
def check_ci_builds_secrets(options, logger)
  model = Ci::Build
  encrypted_attributes = []

  # Collect only found encrypted attributes
  if model.respond_to?(:encrypted_attributes) && !model.encrypted_attributes.nil?
    encrypted_attributes += model.encrypted_attributes.to_a
    logger.info Rainbow("  ✅ Found #{model.encrypted_attributes.size} encrypted attributes via `encrypts`: #{model.encrypted_attributes.to_a.join(', ')}").green
  end

  if model.respond_to?(:attr_encrypted_attributes) && !model.attr_encrypted_attributes.nil?
    encrypted_attributes += model.attr_encrypted_attributes.keys
    logger.info Rainbow("  ✅ Found #{model.attr_encrypted_attributes.keys.size} encrypted attributes via `attr_encrypted`: #{model.attr_encrypted_attributes.keys.join(', ')}").green
  end

  if model.respond_to?(:encrypted_token_authenticatable_fields) && !model.encrypted_token_authenticatable_fields.nil?
    encrypted_attributes += model.encrypted_token_authenticatable_fields.map { |attr| "#{attr}_encrypted" }
    logger.info Rainbow("  ✅ Found #{model.encrypted_token_authenticatable_fields.size} encrypted tokens: #{model.encrypted_token_authenticatable_fields.join(', ')}").green
  end

  if encrypted_attributes.empty?
    logger.info Rainbow("❌ No encrypted attributes found for model #{model}.").red
    return
  end

  encrypted_attributes = encrypted_attributes.uniq
  logger.info Rainbow("✅ Found #{encrypted_attributes.size} unique encrypted attributes: #{encrypted_attributes.join(', ')}").green
  logger.info Rainbow("Starting check...").blue

  # Select only necessary records
  scope = options[:last] ? model.order(id: :desc).limit(options[:last]) : model.all
  total_records = scope.count
  logger.info Rainbow("Total records to check: #{total_records}").blue

  failures_per_row = Hash.new { |h, k| h[k] = [] }
  processed_records = 0

  # Increase batch size for speed
  scope.find_each(batch_size: 5000) do |data|
    encrypted_attributes.each do |attr|
      unless valid_attribute?(data, attr, logger)
        failures_per_row[data.id] << attr
      end
    end
    processed_records += 1
    logger.debug Rainbow("Processed #{processed_records}/#{total_records} records").cyan if (processed_records % 5000).zero?
  end

  # Output results
  if failures_per_row.empty?
    logger.info Rainbow("✅ No errors found! All encrypted attributes are correct.").green
  else
    logger.info Rainbow("❌ Found #{failures_per_row.keys.count} records with errors:").red
    failures_per_row.each do |row_id, attrs|
      logger.info Rainbow("  - Record with ID: #{row_id}, failed to decrypt attributes: #{attrs.join(', ')}").red
    end
  end
end

# Run the script
check_ci_builds_secrets(options, logger)

Steps to reproduce

Just run gitlab-rake check gitlab:doctor:secrets on 2,3 Million records

Configuration

16 GB RAM 10 CPU 80 GB SSD, Postgresql 14 (2Gb + 2CPU) Intel Xeon Gold 6448Y, 2,8 Ghz

Versions

Please select whether options apply, and add the version information.

  • Self-managed
  • GitLab.com SaaS
  • Dedicated

Versions

  • GitLab (Web: /help or self-managed system information sudo gitlab-rake gitlab:env:info): 17.11.9 - 18.1.2

Observability

  • OpenTelemetry SDK version:

Helpful resources

  1. Check the FAQ for helpful documentation, issues/bugs/feature proposals, and troubleshooting tips.
  2. Before opening a new topic, make sure to search for keywords in the forum search
  3. Check the GitLab project for existing issues. If you encounter a bug, please create a bug report issue.
  4. Review existing troubleshooting docs.

Thanks for taking the time to be thorough in your request, it really helps! :blush:

You would have to open an issue here: Issues · GitLab.org / GitLab · GitLab if you want Gitlab Devs to even consider it as an option to include/improve. Devs don’t read forum posts, they only concentrate on issues that have been opened.

They would have to check and verify that the AI-generated code is good enough, accurate as well as unlikely to cause problems like destroying data.

1 Like

Thank you for your reply, I will write to you using the link you provided.

2 Likes