Artifact Access from Pipeline to Pipeline in Same Project

Problem

Due diligence:
artifact fetching
artifact dependency
artifact docs
gitlab issue for artifacts
and many more. I will limit since often posts have a set number of links allowed.

Gitlab Version

We have the gitlab saas and premium so no feature issues.

The problem I am trying to solve, which seems like it should be fairly easy, is to access artifacts created in pipeline one in a build_job from pipeline two in a release_job. Build job creates artifacts like so:

build_package:
...
  artifacts:
    paths:
      - dist/

branch flow:


  O           ____________         ____________
 _|_  push   |            |  MR   |            |
  |  ------> | branch one | ----->| branch two |  
 / \         |____________|       |____________|
                   |                    |
                   |                    |
                   v                    |
                artifact        <- needs artifact
        

“Solutions” Seen

Needs

I have seen people say they use a needs somewhat like this:

release_job:
  needs:
    - project: $PROJECT_WITH_ARTIFACTS
      job: $JOB_THAT_CREATES_ARTIFACTS
      ref: $BRANCH_NAME_WITH_ARTIFACTS
      artifacts: true

this is usually when referencing a pipeline’s artifacts in a different project, but it is said that you can use your same project in the - project: parameter. However, in my case branch one would always be a different ref: so I am unsure how branch two would get that dynamic information. It is likely I am not understanding something simple, but I couldn’t get this to work even when I tried hardcoding the ref: with the ref name known ahead of time.

Artifacts API

The api docs show how you can download the artifacts archive:

'curl --location --output artifacts.zip "https://gitlab.example.com/api/v4/projects/$CI_PROJECT_ID/jobs/artifacts/main/download?job=test&job_token=$CI_JOB_TOKEN"'

again, you need to know dynamic information in that curl call since branch one would have a different ref name on every new branch push.

both the needs and the API suffer from me being unable to know the branch one’s ref name.

Is There a Canonical Way?

In Gitlab is there a canonical way to access artifacts from earlier pipelines in the same project?

Using the API seems like a roundabout way to do it since your CI/CD pipeline is in the project with your artifacts that are all accessible.

Any help is greatly appreciated. If more information is needed please ask and I am open to other suggestions that would accomplish the same thing like changing branch strategy or something else.

Thank you.

AFAIK, there is no native way in GitLab to pass in artifacts between different pipelines. I believe your best bet would be to upload to some artifact/package repository (GitLab’s built-in one, or another one, up to you), and then download those from the other pipeline. However, make sure to setup proper cleanup rules for this.

In order to identify where the artifact is coming from (to be able to download), you can try using built-in CI variables like $CI_COMMIT_BRANCH (to build an artifact name and upload in branch one) and $CI_MERGE_REQUEST_SOURCE_BRANCH_NAME (to build a name for downloading artifact in branch two).

I am using GCP Artifact Registry as the final destination for the Python package once built, tested, and released/tagged. However, for intermediary steps (between branches), storing in a remote repository seems excessive. I have seen people use a local PyPi repo, but again seems like a heavy solution for something that already exists in your repo: the Artifacts under your “Build” tab.

If there really is no way to download/access artifacts from one pipeline to another, that seems like a huge miss on Gitlab functionality. What seems more likely is I am missing something in how to properly access/download them or I am designing the CI/CD pipeline strategy poorly. :thinking:

Also this documentation for needs:project here makes me think this functionality must exist and is something Gitlab designed for artifacts to be used in different pipelines. Specifically this line:

To download artifacts from a different pipeline in the current project, set project to be the same as the current project, but use a different ref than the current pipeline. Concurrent pipelines running on the same ref could override the artifacts.

Cache

There is caching in Gitlab, while this would work, there are 2 issues I see for me:

  1. We don’t own/operate our k8s cluster so making the config change and setting up everything isn’t easy (it’s possible, but potentially a long time horizon).
  2. Cache is different than an artifact so use case, while similar, is not identical. Seems like the wrong place to store artifacts for subsequent pipelines.

Branch Strategy

If I were to design the branching strategy from this:

                                   long running
                                     branch
                                        |
                                        V
  O           ____________         ____________
 _|_  push   |            |  MR   |            |
  |  ------> | branch one | ----->| branch two |  
 / \         |____________|       |____________|
                   |                    |
                   |                    |
                   v                    |
                 build
                artifact        <- needs artifact

to this:

                                   long running
                                     branch
                                        |
                                        v
  O           ____________         ____________
 _|_  push   |            |  MR   |            |
  |  ------> | branch one | ----->| branch two |  
 / \         |____________|       |____________|
                   |                    |
                   |                    |
                   v                    v
               just lint            build,test,
                                    and publish artifact

then branch two would have access to the artifact. However, this elongates the feedback loop of the development cycle and requires an MR to know if a change you made causes a failure. That isn’t ideal.

The thing is as follows:

Normally one must build artifact anyways AFTER merge (since the code has been merged, you need a new build). Since I’m not sure what you’re doing, I didn’t question your logic/artifacts, but perhaps you should.

Otherwise, you could consider using Merged Results pipeline, where the MR pipeline already builds artifacts, as if the code would have already been merged. Then you could avoid making two builds - one before and one after.

I took a quick glance at Merged Results pipelines, but I will need to spend a little more time digesting that to understand it and to see if it fits my use case. Thank you for the suggestion.

As for building artifacts after a merge, why is this considered normal or even desirable? It seems to me that you lengthen the feedback loop for a developer in that strategy. For example:

   ____ dev feedback loop _______________
  |                                      |
  |                                      v
  |
  |              short                 long
                 branch                branch
  O           ____________          ____________
 _|_  push   |            |  MR    |            |
  |  ------> | branch one | -----> | branch two |  
 / \         |____________|        |____________|
                   |                     |
  ^                |                     |
  |                v                     v
  |              lint               build, test,
  |                                 publish artifact
  |
  |___ dev feedback loop ________________|

in this scenario, you do not fail fast. You will not catch build or test errors until after a merge, which often times requires a person to review and approve. You could certainly build and test in the short lived branch, but this duplicates pipeline work, which can become inefficient. Ideally, you would: lint, build, and test in the short branch and carry over the built package artifact so you could publish in the long running branch. This would then create a dev feedback loop like this:

   ____ dev FL ____
  |                |
  |                v
  |
  |              short                 long
                 branch                branch
  O           ____________          ____________
 _|_  push   |            |  MR    |            |
  |  ------> | branch one | -----> | branch two |  
 / \         |____________|        |____________|
                   |                     |
  ^                |                     |
  |                v                     v
  |           lint, build,          publish artifact
  |           test
  |                |  
  |___ dev FL _____|

Now I am certainly not a pipeline or CI/CD expert so it is very possible I am not seeing a different solution that is better and more efficient than the two I visualized here. Is there a strategy you use that shortens the dev feedback cycle or has this not really been a problem that you have seen?

Your solution is ideal in theory, and definitely 100% correct for the case where branch one is 100% in sync (merged already or rebased onto long lived branch two) - and I believe this is not the case for most projects (a lot of developers working parallel and constantly merging to branch two). And since this is not the case, we must anyway build twice - once before and once after merge. This is why I believe for most people this is not an issue, it’s easy to setup and maintain. Also, build speed can be increased by using caching, etc.

Again, I would not say I’m an absolute expert on the topic, but I do have 3 years of experience with GitLab, so I’m pretty familiar with GitLab pipelines and features. Not sure how other CI/CD tools are handling this situation and if they have different options.

1 Like

Thank you, I appreciate your help, information, and feedback.

I ended up building twice as that seemed like the best way forward. I am glad it aligns with what you have been doing as well.

1 Like