DrGlitch's Weblog Stuff that bounced my mind.


Last update on .

And dead cluster nodes always make me feel sad. Poor things probably didn't deserve this.
Here's what happened and how I got things resolved.

My pet kubernetes cluster had a spontaneous outage yesterday. Well, what else is there to expect when the data center has a power failure?

I found time to look into the matter today. First of all, I checked the cluster nodes - hm, OK, alive and happy.
The kube-system namespace was not as good, and I found flannel and coredns deployments failing, already in graceful step-back.

I managed to reveal a number of issues (systemd mount units for bind mounts getting triggered too late, etc) but failed to resolve at first - even after completely purging and re-installing latest flannel (kubectl delete ... , kubectl apply ...).

Finally, kubelet logs on one of the nodes revealed the actual root cause: systemd-resolved was not running - and apparently neither enabled nor started by default!

Nevermind the greetings from the "how did this ever work!?"-department...
After adding the necessary steps to my k8s management fabric and running the task, things automagically started to settle :)

And again: it's always DNS.

Pingbacks

Pingbacks are open.

Trackbacks

Trackback URL

Comments

Comments are closed.