How to cache / reuse the git repository across machines when using aws autoscaling and docker+machine?

Hi there,
I got AWS set up to build and do autoscaling. This was a little arduous, but it works! Yay. The problem is that our repository is huge, 50GB or something like that, so it takes forever to clone each time.

When I have a dedicated machine at home being the gitlab-runner, it I have it set up so that it keeps reusing the already cloned repository, so builds are super fast. If I were to spin up a regular EC2 instance, with no autoscaling, I could manage to do the same thing, but in the cloud. However, Iā€™m trying to only have the big expensive machines running when they need to and use autoscaling to manage the cpu use.

Iā€™d like some combination of the following features:
ā€“ a cloned repo stays persistent on a given autoscaled machine
ā€“ the machine is put into STOPPED state, rather than terminated, when IdleTime is reached.
ā€“ when a new job comes in, the machine is put into running state, and continues with its already cloned repo.

or
ā€“ all the machines can share a cached version of the repo, so it can be much quicker to get into a usable state once the machine starts.

Any thoughts on how to accomplish that?

The time difference is currently something like: 60 minutes for full clone, vs 4 minutes if already cloned.

Thanks!

1 Like