.gitlab-ci.yaml nohup

yanwencheng · May 8, 2020, 10:55am

The shell is used for continuous deployment. After the execution of nohup, the page keeps outputting the log information. The job can’t judge whether it’s finished, until the deployment task reports an error one hour later

before_script:
  - python3 -V
stages:
  - deploy
docker-deploy:
  stage: deploy
  # 执行Job内容
  script:
    - nohup celery -A ykkj_ai worker -P eventlet > celery_worker.out & # 异步任务
    - nohup celery -A ykkj_ai beat > celery_beat.out & #定时任务
    - nohup python3 manage.py runserver 0.0.0.0:8000 --insecure > django.out & # 启动django
  tags:
    # 执行Job的服务器1
    - gpu-server
  only:
    # 只有在master分支才会执行
    - master

this is my ci log

Running with gitlab-ci-multi-runner 9.5.1 (96b34cc)
  on GPU Ai计算服务器 (ad25c699)
Using Shell executor...
Running on localhost.localdomain...
Fetching changes...
正删除 celerybeat-schedule.bak
正删除 celerybeat-schedule.dat
正删除 celerybeat-schedule.dir
正删除 celerybeat.pid
正删除 common/__pycache__/
正删除 featrue_bokeh/__pycache__/
正删除 fileapp/__pycache__/
正删除 fileapp/migrations/__pycache__/
正删除 mqttapp/__pycache__/
正删除 mqttapp/migrations/__pycache__/
正删除 scatnet/__pycache__/
正删除 sklearnapp/__pycache__/
正删除 sklearnapp/migrations/__pycache__/
正删除 sklearnapp/utils/__pycache__/
正删除 utils/__pycache__/
正删除 web/__pycache__/
正删除 web/migrations/__pycache__/
正删除 web_crawler/__pycache__/
正删除 wlw_device/__pycache__/
正删除 wlw_device/migrations/__pycache__/
正删除 wlw_device/service/__pycache__/
正删除 wlw_feature_normal/__pycache__/
正删除 wlw_feature_normal/migrations/__pycache__/
正删除 wlw_plant/__pycache__/
正删除 wlw_plant/migrations/__pycache__/
正删除 wlw_record_info/__pycache__/
正删除 wlw_record_info/migrations/__pycache__/
正删除 ykkj_ai/__pycache__/
正删除 ykkj_celery/__pycache__/
正删除 ykkj_dtr/__pycache__/
HEAD 现在位于 722d084 定时任务
来自 http://192.168.1.108/root/ykkj-ai
   722d084..3f5c5fe  master     -> origin/master
Checking out 3f5c5fe8 as master...
Skipping Git submodules setup
$ python3 -V
Python 3.7.6
$ nohup celery -A ykkj_ai worker -l info -P eventlet &
$ nohup celery -A ykkj_ai beat -l info &
$ nohup python3 manage.py runserver 0.0.0.0:8000 --insecure &
[2020-05-07 21:36:07,165] [autoreload.py:597] [autoreload:run_with_reloader] [INFO]- Watching for file changes with StatReloader
/root/anaconda3/lib/python3.7/site-packages/celery/platforms.py:801: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

  uid=uid, euid=euid, gid=gid, egid=egid,
[2020-05-07 21:36:16,551: INFO/MainProcess] Connected to amqp://guest:**@192.168.1.105:5672//
[2020-05-07 21:36:16,562: INFO/MainProcess] mingle: searching for neighbors
[2020-05-07 21:36:17,601: INFO/MainProcess] mingle: all alone
[2020-05-07 21:36:17,620: INFO/MainProcess] pidbox: Connected to amqp://guest:**@192.168.1.105:5672//.
[2020-05-07 21:38:00,114: INFO/MainProcess] Received task: Gpu资源调度[54496a64-596c-4c89-8852-15e315b5dd20]  
[2020-05-07 21:38:00,115: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:40:00,003: INFO/MainProcess] Received task: Gpu资源调度[8f9754d9-6494-4a26-bf6e-65399385ccd8]  
[2020-05-07 21:40:00,004: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:42:00,045: INFO/MainProcess] Received task: Gpu资源调度[8327c81b-4069-48d5-867a-b36492dc3bdc]  
[2020-05-07 21:42:00,045: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:44:00,050: INFO/MainProcess] Received task: Gpu资源调度[84413e5f-a6a8-46fb-807d-113177cfe0a2]  
[2020-05-07 21:44:00,050: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:46:00,068: INFO/MainProcess] Received task: Gpu资源调度[ee5ee632-fbe2-4f30-b121-347469ed3642]  
[2020-05-07 21:46:00,069: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:48:00,094: INFO/MainProcess] Received task: Gpu资源调度[05fc4639-7f88-42d9-8f55-b6cd0e259c72]  
[2020-05-07 21:48:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:50:00,095: INFO/MainProcess] Received task: Gpu资源调度[1415d3e6-6147-4f71-a9af-d9f1eeb3b908]  
[2020-05-07 21:50:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:52:00,094: INFO/MainProcess] Received task: Gpu资源调度[3ad068c0-1ed8-4929-96b1-fba8d298441f]  
[2020-05-07 21:52:00,094: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:54:00,094: INFO/MainProcess] Received task: Gpu资源调度[3ac7c3ab-90c3-403b-9a9f-ec5d1269ba0f]  
[2020-05-07 21:54:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:56:00,095: INFO/MainProcess] Received task: Gpu资源调度[6d8eafac-c3fa-4393-9068-bffbfbc1a3e0]  
[2020-05-07 21:56:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 21:58:00,094: INFO/MainProcess] Received task: Gpu资源调度[e2285e24-5dee-401e-8b7d-de321034e9b0]  
[2020-05-07 21:58:00,094: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:00:00,004: INFO/MainProcess] Received task: Gpu资源调度[67890e66-7981-4939-a7c8-e5f395b2fc59]  
[2020-05-07 22:00:00,004: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:02:00,068: INFO/MainProcess] Received task: Gpu资源调度[aa7577db-84cf-4ff6-8565-bb0e61eaebe1]  
[2020-05-07 22:02:00,069: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:04:00,094: INFO/MainProcess] Received task: Gpu资源调度[3362d635-32d3-42c5-8563-7ec458320871]  
[2020-05-07 22:04:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:06:00,050: INFO/MainProcess] Received task: Gpu资源调度[a0125150-4424-4263-9655-bbe047ebced6]  
[2020-05-07 22:06:00,050: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:08:00,093: INFO/MainProcess] Received task: Gpu资源调度[b9bee588-8180-45db-b59d-55f0d345e8bb]  
[2020-05-07 22:08:00,094: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:10:00,095: INFO/MainProcess] Received task: Gpu资源调度[cbbb2786-d0dd-4c13-9a9a-b0ea80f55ecb]  
[2020-05-07 22:10:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:12:00,011: INFO/MainProcess] Received task: Gpu资源调度[fe843d89-8656-41af-b7c4-35bb59f5b9e6]  
[2020-05-07 22:12:00,012: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:14:00,094: INFO/MainProcess] Received task: Gpu资源调度[5f3d8115-7bfb-4faf-b810-685799d2ed46]  
[2020-05-07 22:14:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:16:00,094: INFO/MainProcess] Received task: Gpu资源调度[1d52497a-8e5d-4622-a1c9-aefc69d9f8d3]  
[2020-05-07 22:16:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:18:00,094: INFO/MainProcess] Received task: Gpu资源调度[1f6c3b5c-d4e5-489e-ad56-13a7c4e0f164]  
[2020-05-07 22:18:00,094: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:20:00,093: INFO/MainProcess] Received task: Gpu资源调度[b53f8a07-7af4-4fdc-b49d-d55d04b5e03d]  
[2020-05-07 22:20:00,094: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:22:00,095: INFO/MainProcess] Received task: Gpu资源调度[3088bf08-3f6a-4c6a-b1e8-83e404059d2f]  
[2020-05-07 22:22:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:24:00,013: INFO/MainProcess] Received task: Gpu资源调度[9c282c2e-94bd-48de-8263-e1874e5193c8]  
[2020-05-07 22:24:00,013: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:26:00,083: INFO/MainProcess] Received task: Gpu资源调度[24449326-bcbd-4ec3-9af0-9a57553095cf]  
[2020-05-07 22:26:00,084: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:28:00,095: INFO/MainProcess] Received task: Gpu资源调度[b7451da6-e0ee-40c1-9bb7-04f1b92a1397]  
[2020-05-07 22:28:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:30:00,039: INFO/MainProcess] Received task: Gpu资源调度[f0e7dde0-b0e0-4177-9693-503a6fd94451]  
[2020-05-07 22:30:00,040: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:32:00,051: INFO/MainProcess] Received task: Gpu资源调度[d168dd05-1090-45eb-9e65-4ce5d8a98136]  
[2020-05-07 22:32:00,051: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:34:00,094: INFO/MainProcess] Received task: Gpu资源调度[9eff5082-3d58-4d78-8445-f930d95077bb]  
[2020-05-07 22:34:00,095: WARNING/MainProcess] GPU资源调度!
[2020-05-07 22:36:00,093: INFO/MainProcess] Received task: Gpu资源调度[0d684e92-24a0-4657-a641-db882a8a4cd5]  
[2020-05-07 22:36:00,093: WARNING/MainProcess] GPU资源调度!
ERROR: Job failed: execution took longer than 1h0m0s seconds

dnsmichi · May 21, 2020, 1:49pm

Hi,

with using the nohup ... & pattern you move the command into the background and leave it running even if the shell is detected/destroyed. If you continue running the jobs more than once, you get quite a few processes binding ports and consuming resources, for example the runserver on port 8000.

Imho your script needs to check whether old commands are still running before triggering a new deployment. Also, I would recommend to use a different method to deploy and run services on a VM/container.

supervisorctl in containers, systemd services work really good in this regard.

Another idea: You build your environment with containers and docker-compose running as daemon. Each time you run a deployment, the previous stages have built a new container image, pushed it to the registry and the deployment step just pulls the registry and invokes docker-compose restart then.

Cheers,
Michael