I want to download data using Gitlab CI. I have a bash for that Script written.
I originally wanted to use Alpine Linux for. Unfortunately, the Bash shell of Alpine Linux does not handle the loop. That is why I chose Debian Linux. Maybe someone has a solution to this problem?
The data I download are csv files. How do I check if this data is complete and without errors?
If I make a change in the repo, then the .gitlab-ci.yml runs twice in a row. Please have a look at my .gitlab-ci.yml. I ask for improvements. I am a Beginner.
My question was not clear enough. After a change in the build script the CI runner runs twice. First after every update and second from my script. To skip the second start of a runner Mark gave me this tipp.
git commit -m "Add $Year/$Symbol-$Year-$i.csv.gz [skip ci]"; in the build script.
The [ci-skip] Tag is a solution. The disadvantage of this is that the commit messages has this [ski-ci] tag. I need a tipp to hide this tag in the commit message.
Mark Fletcher gave me this answer:
Maybe you can add a condition? https://docs.gitlab.com/ee/ci/yaml/#only-and-except-complex
I’ve looked over your example again, and I’m still not quite sure what you’re trying to achieve…
this is what I understand:
Every time you check in code to your project, you want your CI to download these CSV files from a url
These files, once downloaded, are then checked in to your repository
the thing is, when you run a push inside your job, you’ll trigger the pipeline once again. That’s probably where your second run is coming from, and as you’ve mentioned that’s not what you want to do.
It’s also not the greatest thing to be modifying the repository itself on any commit - I’m confused as to why you would want to do that. If you simply need these files available for a later, as-yet-undefined build step, then you should be pulling them as and when you need them. I’m not a big fan of storing binary data (as a .gz file is) in a git repo, there’s artifact storage for that.
If, on the other hand, your repo is all about these files - then using CI to populate it isn’t necessarily the best solution - you might get better mileage with some form of cron job.
Thank you for your answer. You wrote that I not use CI. The bash script starts via cron job.
Here I have some questions
If I run the cronjob, does this bring a bash shell by default? How do I start the script for the test? I need to install wget and git. Please explain me, what I have do do.
Here is a first idea of the script.
#!/bin/bash
git remote set-url origin https://$GIT_CI_USER:$GIT_CI_PASS@gitlab.com/$CI_PROJECT_PATH.git
git config --global user.email "user@example.com"
git config --global user.name "Max Mustermann"
git checkout master
URL="https://data.example.com"
Symbol="Alpha"
Year="2018"
if [ -d $Year ];
then
echo $Year found;
else
mkdir $Year;
fi
if [ "$Year" = "$(date +%G)" ];
then
t=$(date +%V)-2;
else
t=$(date -d $Year'1231' +'%V');
fi
for (( i=1; i<=$t; i++ ));
do
wget -c $URL/$Symbol/$Year/$i.csv.gz -O $Year/$Symbol-$Year-$i.csv.gz;
gzip -t $Year/$Symbol-$Year-$i.csv.gz && echo The file is okay || echo The file is corrupted;
git add $Year/$Symbol-$Year-$i.csv.gz;
git commit -m "$Year/$Symbol-$Year-$i.csv.gz [skip ci]";
done
git push origin master
rm -rf "%CACHE_PATH%/%CI_PIPELINE_ID%"
exit 0
The shell script works fine on my local mashine. I uploaded the script to gitlab. On my local machine is git and a bash per default installed. I don’t understand how to run it on gitlab as a cronjob and do I need to install the bash and git?
when I mentioned running something as a cronjob, I wasn’t thinking about scheduling it through gitlab at all, but if that’s working, then great.
as for
Is it working? Does it do what you want it to do? Then, in my mind, it’s a good solution. There may be better/other/different ways to do something, but if you have a solution that is simple, understandable and maintainable, then that is good enough.