Container runtime failure due to IO latency

Name: Container runtime failure due to IO latency
Uploaded: Jan 31, 2020
Duration: 75 s

Jesse Noller429 subscribers

74 views

Jan 31, 2020

1:15

Example view of what happens when the IO throttle on IaaS VMs/Disks kicks on the worker nodes in an AKS cluster. Additional data/logs will be in full writeup. As you can see - an idle 5 node cluster where I execute `helm install prometheus-operation` executes fine, however when the pods/containers come online the IO latency spikes to above 100ms in some cases. This causes a cascading failure - PLEG runtime failures in the kubelet and IPC errors in the docker runtime. This triggers orphaned containers and additional IO load due to repeat boot of the containers due to the failed runtime. This also slows kernel calls and other operations at the SDN (CNI) level because the container operations for the CNI containers are blocked in IOWait.

Download

0 formats

No download links available.