Deploy with LFTP uploads all files (even unchanged ones)

I am trying to deploy to an (S)FTP server using the lftp command.

The problem is that it uploads all files, even the unchanged ones. I know I could limit it to only look at files with a changed size, but this makes it so very, very minor changes don’t get deployed (; added or number change from 0 - 1 or something).

Currently I use this command:

lftp -c “set net:timeout 5; set net:max-retries 3; set net:reconnect-interval-base 5; set ftp:ssl-force yes; set ftp:ssl-protect-data true; set sftp:auto-confirm yes; set ssl:verify-certificate no; open $host:$port; user $username $password; mirror $exclusions -v -c -P 10 -R …/ $remoteFolder”

With this command, it basically uploads all files that have been changed (time wise). The problem is that all files have been changed at the time the job starts (according to ls -la, see screenshot).
So I was wondering, is there a way to keep the times the file last changed in the gitlab repository (instead of the time the job started at).

image

Thanks for reading this block of text.

1 Like

on filesystem level? I doubt, because the repository is “copied” every time you start the job. Or actually… you can try changing from “clone” to “pull” CI strategy. This only works if you’re not a virtualized runner (eg docker, parallels, vurtialbox, kubernetes) because these runners, by default, create a new environment for each job, so it will just fallback back to “clone” strategy.

If you’re using “shell”, “ssh” or a runner that, by default, doesn’t delete the repo after the job finishes, add the following at the top of your .gitlab-ci.yml file and test if the file changed/updated date changes or not.

variables:
  GIT_STRATEGY: fetch

Keep in mind that the runner needs to preserve the repository between jobs. You are also able to set GIT_STRATEGY per job - say for only “deploy” job. This is basically just an environment variable.
You can have one runner on a VM that is used only for deployment and doesn’t ever remove the repository.

I currently do use docker (I’m using the shared Gitlab.com runners currently), this is my .gitlab-ci.yml:

https://hastebin.com/utedajeruc.pl

Looking at the documentation, it doesn’t say that it’s impossible with docker, but I would have to look into how it might be possible (https://docs.gitlab.com/ee/ci/yaml/#git-strategy).

So, I was thinking, would it work with this scenario?

A “setup” job that uses the fetch strategy and that caches it’s result throughout pipelines, and other jobs then fetch the results of this setup job.

Cached directories / files don’t have this issue, if I cache /vendor/ for example, it does keep the older modified date it seems (as it doesn’t get uploaded every time).

I don’t fully know how I would implement this though.

Well yeah, I suppose you can also use cache. Have you tried this method ? Did it work out?
I wasn’t able to reply earlier, because new accounts can not post more than X replies / day on this forum… Tbh they should up this limit because it doesn’t seem like any staff members are really paying attention to it anyway…

This took me way longer then it should.

Anyways, I fixed it by adding an additional stage called “setup” where I execute the following: https://pastebin.com/bGCFd3iX

I found this utility called “git-restore-mtime”, threw it into a setup job, after the job I create an artifact of it so it can be used in future jobs within the pipeline.

1 Like

Hah, interesting. Glad you managed to fix it :smiley:

1 Like

Yea, and as a note, caching did seem to work somewhat. The issue was that the git fetch ran before the cache got loaded. Hence why that wasn’t an option.

Running on a docker on a local machine has other options as you can setup a shared folder where the repositories get stored (at least that’s what I understood from it), so on that you could run the fetch strategy. But on the public gitlab shared runners, git-restore-mtime seems like the best option currently.

1 Like

I really think you should remove the

  artifacts:
    paths:
     - '*'
    expire_in: 1 hour

If you really have to do it like this, at least use cache and not artifacts…

Okay… Read Deploy with LFTP uploads all files (even unchanged ones)

I mean, I could try caching for that. The only reason I used artifacts is because I want to use them in between jobs in the same pipeline.

Cache I currently use to cache node_modules & vendor between all jobs & pipelines. I would have to look at how I cache the repository in between jobs in the same pipeline and on top of that node_modules & vendor throughout jobs & pipelines.

I was dealing similar issues. I deal with it by creating custom Docker image containing FTP Deployment (handling changes) and LFTP for handling parallel upload.

If you are interested, I wrote and article about it here: https://dev.to/arxeiss/parallel-incremental-ftp-deploy-in-ci-pipeline-2511