I hope this topic is at the right place because it is a user question and not really about administration of gitlab. I am pretty new to continuum integration…
I will first try to explain what I would like to do. I have a gitlab repository with mainly python scripts that digest raw data files, merge data and produce a clean (I hope) data file ready for production. If I push new raw data in my repository I would like to trigger the production of my clean data file. And at the end I would like that people (standard users that are not ready for any git command) be able to download these clean files.
To do that, I wrote the following
.gitlab-ci.yml file in order to run a
deploy.py script on all files in
raw_data directory if one file in this directory and sub-directories is updated.
build_csv: stage: deploy image: python artifacts: paths: - data/ script: - pip install --upgrade pip - pip install pandas - python deploy/deploy.py only: changes: - raw_data/**/*
my python script produce file in a
My first idea was that I want the files in the
data/ directory to be copied back in the repository as new files or updated files in this
data/ folder. Does it this possible ? Or is it a good/bad practice/idea ?
Then using the artifacts, I can get access to the files produced in the data folder from the web interface of gitlab (CI/CD --> Jobs). What is the way to ease the access of these output files from (for example) the home page of my repository ? I see from there that I can write a permalink to the latest artifacts file. So, maybe, one possibility is to put a link on the README ? Again is it a good practice or does it exist a better solution ?
For exemple is it possible to copy or push these files on another repository ? On a custom cloud ?
Thanks for your comments and advices