What exactly is downloaded when using the Job Artifacts API?

I am working on automating some of our build/deploy processes, and I am confused as to what goes on behind the scenes when I use the Job Artifacts API.

curl --location --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/1/jobs/artifacts/main/raw/some/release/file.pdf?job=pdf"

My Setup Description:
I have multiple projects stored in a self-managed GitLab server. One of those, we’ll call ParentProj, generates a file named MasterFile.txt in some/artifacts/dir. I have a second project, called ChildProj, which has a configuration set in place to create the files SubFile1.txt, SubFile2.txt, …, SubFileN.txt whenever I manually push MasterFile.txt into ChildProj.

I would like to automate this portion, so I introduced a pipeline trigger to ParentProj which triggers ChildProj successfully. When ChildProj is triggered, I use the Job Artifacts API to download MasterFile.txt to ChildProj. In my testing, I was able to successfully download the desired file. Now that I have automated the bridged process, MasterFile.txt is produced with the text “{“message”:“404 Not found”}”.

What am I trying to understand?:
When I call this API to fetch MasterFile.txt from ParentProj, what exactly is downloaded?

curl --location --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/projects/1/jobs/artifacts/main/raw/some/release/file.pdf?job=pdf"

Say I have two pipelines from ParentProj with the following:
ParentProj Pipeline 1 (the first most recent) successfully ran, but MasterFile.txt is not produced.
ParentProj Pipeline 2 (the second most recent) successfully ran and created the MasterFile.txt file.
I thought that calling the API would retrieve MasterFile.txt from the last successful pipeline which executed the job which produces MasterFile.txt.

My .gitlab-ci.yml Configuration:

  stage: build
    - cmd /c scripts\master_file\build_master.bat
    - cmd /c "curl -X POST -F token=<TRIGGER_TOKEN> -F ref=my-branch http://my-server/api/v4/projects/<ChildProjID>/trigger/pipeline"
      - some/artifacts/dir/MasterFile.txt


  stage: fetch
  script: cmd /c scripts\fetch_master_file.bat
  - if: $CI_PIPELINE_SOURCE == "trigger"

ChildProj\scripts\fetch_master_file.bat (redacted to exclude comments and debug statements; fetches MasterFile.txt, commits if it is updated, and pushes to itself)

curl --location --header "PRIVATE-TOKEN: <ACCESS_TOKEN>" "http://my-server/api/v4/projects/<ParentProjID>/jobs/artifacts/my-branch/raw/some/artifacts/dir/MasterFile.txt?job=master_file" --output ./master/MasterFile.txt
git diff-files --quiet ./master/MasterFile.txt
if "%ERRORLEVEL%"=="0" (
    git add ./master/MasterFile.txt
    git remote set-url origin http://%GIT_CI_USER%:%GIT_CI_PASS%@http://my-server/<ChildProjGROUP>/ChildProj.git
    git commit -m "ci(auto-commit): 'MasterFile.txt' file changes, auto-commit from pipeline"
    git push

Thank you all in advance! I am still new to GitLab and keep discovering there are many gotchas, and sometimes the documentation is not the most clear. Hopefully this all makes sense.

Artifacts from last pipeline are fetched, there is no check if the pipeline created the artifact or not and it’s user’s job to ensure the required artifact is there.

1 Like

So this seems to be a timing issue that I would need to address… and my understanding may be off.

I thought that by chaining my pipelines like I did (ParentProj master_file job builds file, triggers ChildProj > ChildProj fetch_master_file job calls the Job Artifacts API to download MasterFile.txt), my MasterFile.txt artifact produced by the master_file job (which triggers the ChildProj pipeline) would be fetched? But since the intitial ParentProj pipeline may still be in a running state when ChildProj pipeline is triggered, it makes sense that MasterFile.txt may not be available since there are instances in which the job is not executed by the previous pipeline execution.

I’ll have to think of a more creative way to get MasterFile.txt into my ChildProj, thank you for chiming in!

I don’t know why you are using API calls to trigger ChildProj pipeline or to fetch artifacts. There are native keywords that can be used to do so. trigger to start child pipelines and needs:pipeline:job to fetch artifact from parent pipeline.

Although using the native keywords may have prevented the issues I ran into, I am using batch scripts to complete most of the execution to maintain portability.

My employer is a Windows shop, we create a Windows Desktop application. I believe if we ever decide to move to a different platform, having batch scripts to execute the majority/all of our pipelines would make the transition easier since we it would require smaller adjustments. The more tied we are to GitLab’s native methods, the harder the transition would be and the more resources it would take.