Gitlab.com Pages + CI + "too large archive"

Hi,

I’m trying to host a static site of around 6 GB on Gitlab.com. The site limit is said to be 10 GB with artifacts max size of 1 GB. How do I proceed to use the 6 GB? I keep getting the error “too large archive” after I get over 1 GB in the CI/CD, despite segmenting my data in separate commits of 500 MB. It seems it refetch the entire repo for the deploy.
I don’t get how I am supposed to do it.

Thanks for any input.

Hi,

I’m curious on what you are trying to host in this size, can you share the repository URL? GitLab Pages is meant for website hosting, based on a Git repository in the backend, with smaller sized assets. If you look for large file hosting, I’d recommend looking into services like S3, Google drive or self-managed Nextcloud or other cloud service offerings.

Cheers,
Michael

1 Like

Hi! Hope you don’t mind my reply to this old post because I’m also running into the 1GB CI/CD limit (although my overall static site is much smaller than the poster above).

I’m trying to host a simple, non-monetized personal travel blog on Gitlab Pages. This is the repo (id: 7492392) and the site (https://charmainejade.com). I’m not trying to use the page for large file hosting; I’m just trying to make my blog posts show up nicely and non-pixelated regardless of screen size. Would appreciate any advice!

I tried accessing the project by ID using the following script, but it seems the project is not public.

#!/usr/bin/env python

import gitlab
import os
import json 
import sys

GITLAB_SERVER = os.environ.get('GL_SERVER', 'https://gitlab.com')
GITLAB_TOKEN = os.environ.get('GL_TOKEN') # token requires developer permissions
PROJECT_ID = os.environ.get('GL_PROJECT_ID')

if __name__ == "__main__":
    if not GITLAB_TOKEN:
        print("🤔 Please set the GL_TOKEN env variable.")
        sys.exit(1)

    gl = gitlab.Gitlab(GITLAB_SERVER, private_token=GITLAB_TOKEN, pagination="keyset", order_by="id", per_page=100)

    print("# Starting...", end="", flush=True)

    # Collect all projects, or prefer projects from a group id, or a project id
    projects = []

    # Direct project ID
    if PROJECT_ID:
        print("DEBUG: PROJECT_ID")
        projects.append(gl.projects.get(PROJECT_ID))

    for project in projects:
        project_obj = gl.projects.get(project.id, statistics=True)

        print("### Project {n} statistics\n {s}\n".format(n=project_obj.name_with_namespace, s=json.dumps(project_obj.statistics, indent=4)))    

Can you change the project visibility to public so we can make better suggestions by inspecting the source code and CI/CD pipelines?

Updated the project visibility to public!

use the git LFS, which allows store the large files like image, videos, assets, etc. LFS store the file outside the repository, result in reduce the size of the repository.

Thanks. It looks like the repository is empty (?) or limited permissions for the project repository. Charmaine Arellano-Chua / charmainejade.gitlab.io · GitLab

Maybe you can add me as developer role to better inspect the project.

Thanks, added you as a developer with access until the end of the month. Let me know if that works.

Thanks. Project statistics show what I assumed - the raw data is ~1.1 GB, while the entire Git repository with history tracking sums up to 2.3 GB locally (after a git clone - no worries, I delete this after testing).

The pages job fails due to the public/ content which is generated through jekyll, saying that the artifact data is too huge (exceed 1GB).

pages:
  stage: deploy
  script:
  - bundle exec jekyll build -d public
  artifacts:
    paths:
    - public
  only:
  - master

The documentation for GitLab.com settings says that artifacts can have a maximum size of 1 GB.

There is no way around this limit, so the only strategy here is to reduce the size which is dumped into public/ from the jekyll generation job.

Local jekyll analysis

I tried updating the jekyll installation locally but it is hard with Ruby 3.2.3, because of the old Ruby 2.3 version and outdated depedencies.

  1. Deleted Gemfile.lock (can be regenerated later)
  2. Removed all version dependencies in the Gemfile
  3. redcarpet is not supported as markdown engine anymore. Cannot use redcarpet as markdown processor · Issue #7838 · jekyll/jekyll · GitHub Patched _config.yml to use kramdown instead.

Attached the diff, in case it is helpful. After applying the changes, you need to run bundle update && bundle install to update the Gemfile.lock file again.

diff --git a/Gemfile b/Gemfile
index c510961..918fd2e 100644
--- a/Gemfile
+++ b/Gemfile
@@ -8,8 +8,7 @@ source "https://rubygems.org"
 #
 # This will help ensure the proper Jekyll version is running.
 # Happy Jekylling!
-gem "jekyll", "~> 3.8.3"
-gem 'redcarpet', '~> 3.0.0'
+gem "jekyll"

 # If you want to use GitHub Pages, remove the "gem "jekyll"" above and
 # uncomment the line below. To upgrade, run `bundle update github-pages`.
@@ -17,7 +16,7 @@ gem 'redcarpet', '~> 3.0.0'

 # If you have any plugins, put them here!
 group :jekyll_plugins do
-  gem "jekyll-feed", "~> 0.6"
+  gem "jekyll-feed"
   gem "jekyll-paginate"
 end
diff --git a/_config.yml b/_config.yml
index e26ce9b..f79fb35 100755
--- a/_config.yml
+++ b/_config.yml
@@ -1,5 +1,5 @@
 # Default, Defaults
-markdown: redcarpet
+markdown: kramdown
 highlighter: pygments # or rouge or null
 exclude: [vendor, "node_modules", "gulpfile.js", "package.json"]
 paginate: 5

Running

bundle exec jekyll build -d public

leads to 1.1 GB.

One idea can be to reduce the image size, and optimize them for web viewing, for example, limiting to 1024x768 resolution. ImageMagick provides a CLI tool for that, documented in Tools and tips | The GitLab Handbook

find . -type f -name '*.jpg' -exec sh -c 'convert {} -resize 1024x {}' \;

:thinking: This brings the public directory down to ~780MB. Might still cause troubles if you add more images in the future, though.

:bulb: My suggestion is to use a different storage provider for hosting image binaries, and galleries. GitLab Pages is not designed for that. Google Photos, Dropbox, etc. offer specific image hosting capabilities.

Other ideas

This will not make a difference unfortunately. The problem lies within the Pages upload limits, and not within the Git repository itself. Although it is advisable to not store large binaries in the Git history. For example, if you modify that 500 MB chunk in a future commit, it will be added on top of the existing repository size. While Git aims to compress differences efficiently (hexdump, etc.), there is still an overhead to be expected.

LFS can be one way to avoid this behavior, as suggested in

For your existing repository, you will need to reset the Git history, and start from scratch. For example, delete and recreate the GitLab project. On the local desktop, delete the .git directory, and start over with git init and selectively configure Git LFS for binary assets. After setup, push the changes to the GitLab server.

:x: Although, it will not solve your problem, because GitLab Pages does not support LFS yet. See Support git-lfs files in GitLab Pages (#16027) · Issues · GitLab.org / GitLab · GitLab