Kernel Panic with Docker on some Cloud-optimized Ubuntu Kernels

Hey all - GitLab Distribution Team Member here :wave:

We’ve seen a kernel bug/panic triggered in internal testing when trying to run a container on Ubuntu 20.04 with one of the recently updated cloud provider optimized kernels.

It looks like the Ubuntu Team is aware and is working on it - here’s the launchpad bug link: Bug #1977919 “Docker container creation causes kernel oops on li...” : Bugs : linux-aws-5.13 package : Ubuntu.

Affected kernels (from: Comment #21 : Bug #1977919 : Bugs : linux-aws-5.13 package : Ubuntu)

focal linux-aws-5.13 5.13.0-1028.31~20.04.1
 focal linux-azure-5.13 5.13.0-1028.33~20.04.1
 focal linux-gcp-5.13 5.13.0-1030.36~20.04.1
 focal linux-oracle-5.13 5.13.0-1033.39~20.04.1

For GitLab environments - this could manifest as problems with:

  • Gitlab Runner hosts using the docker executor and crashing when trying to launch the helper/build containers
  • GitLab Runners using the Docker+Machine executor trying to launch an Ubuntu 20.04 instance with an updated kernel to host docker-based builds and not being able to connect to the docker-machine launched vm.
  • Those hosting a docker-based GitLab installation that’s set to start automatically and has been updated to one of the affected kernels.

If you have hosts in AWS, GCP, Azure, and Oracle Cloud running containers and Ubuntu 20.04 - you will want to check on your kernel upgrades and hold off on updating until Ubuntu and the Cloud Providers roll out a fixed kernel.

3 Likes

From the linked Launchpad bug report - fixes have been committed - and an updated kernel appears to be available in GCP - I would expect to see those in other cloud providers in short order.

2 Likes

I’m having the same problems, using an Azure Virtual Machine Scaled Set. I created the VM it’s based off of in azure. here’s hostnamectl output:
virtualization: microsoft
OS: ubuntu 20.04.4 LTS
kernel: linux 5.13.0-1023-azure.
Running docker version 20.10.12 build 20.10.12-0buntu~20.04.1

docker runs fine. generalize the VM. create image. build Virtual Machine Scaled Set using that image.
and each instance running in that VMSS, docker pull works fine, I see images pulled successfully from our azure container registry. I can do docker pull alpine:latest and it pulls successfully. Docker run on those images gives me kernel panic the first time. If I ssh’d into it and ran docker run, I get kicked out of the VM. ssh back in, and I can run docker run all day long successfully. But that ‘docker run’ the first time kernel panic’s every time. Not good as I am using the VMSS as a CI/CD pipeline.