Only run job if its the first time this pipeline runs

Hi everybody!

I’m currently building a pipeline to automatically deploy review apps.
It’s going quite well, but I ran into an issue i don’t know how to overcome.

I have a job that loads data into a database. It is currently set to trigger manually and is working quite well. The dataset is quite big (multiple GB), and i don’t want to run the job every time someone makes some changes, only on new branches/pipelines. I tried to use rules to only run the job the first time the pipeline gets created, but i cannot figure out what keys i have to check.

I tried

rules:
    - if: '$CI_OPEN_MERGE_REQUEST'
      when: never

and this only triggers when creating a new branch, but this does not work with creating merge requests from issues. (because this creates the branch and the merge request at the same time, setting the branch to never run).

Any help in setting up rules to only run a pipeline the first time someone runs it on a new branch/merge request would be greatly appreciated!

Hello @SirRegion !

Are you using a tech stack that would enable you to do this with seeders, or migrations?

Hi @snim2!

Thanks for your quick answer! I don’t have any seeders/migrations in place.

I worked around this issue by testing if a file exists in the container already and failing the job if its present (but with allow_failure set to true to not fail the hole pipeline).

script:
     - docker execcontainername /bin/bash -c 'test ! -f testfile'
     - echo "Will only run once!"
     - docker exec container_name /bin/bash -c 'touch ! -f testfile'
 allow_failure: true

This is by no means a good solution, and i would love to do this without having to rely on the filesystem of my containers, but if someone has the same issue as me, they have have a workaround now.

If you come up with a better solution, please let me know!

Hi @SirRegion

It’s a shame that you can’t use a migration solution to do this for you, but the only sensible thought I had was to run a command line DB query to check whether the data is already present.

I would be inclined to put this in a separate Bash script rather than writing the if/else into the CI config. The script could count the rows in a table (or whatever) and then load the data if the result is below a limit.

This is a bit more complicated than you might have been aiming for, but it might be a bit more robust.

1 Like

Thank you @snim2!

Playing around and trying different things, I came up with the same Bash script solution you proposed.
You have been a great help :slight_smile:

1 Like