Automated OS update with kured for servers running k8s

Posted on April 20, 2021 by Adrian Wyssmann ‐ 2 min read

If you are running a bare-metal cluster you probably run kubernetes on top of some linux os, these systems have to be regularly updated. But an update means sometimes that you have to reboot your servers. This also means during a reboot that particular node is node available to schedule workload.

More importantly, you probably better cordon which mark a Node unschedulable and thus ensures that workloads are scheduled on un-cordoned nodes.

Current state

Even we use [Saltstack] to manage our nodes and [Rancher] for the cluster, we still have manual steps. For example we still manually cordon and un-cordon nodes - with 7 clusters and dozens of nodes that is cumbersome. As we cannot update (or better reboot) all servers at the same time, we have added a custom grain called orch_seq which allows us to do the upgrade happen in a certain sequence, ensuring enough worker nodes are available. The orch_seq is a number from 1 to 7 and these sequences can be done together

  • 1 (x) and 4 (y)
  • 2 (x) and 5 (y)
  • 3 (x) and 6 (y)

Then we do this cluster by cluster using the grain named context. Usually we start with the dev cluster which is the least sensible and then go further up to the production clusters

  1. Drain the nodes of the sequence x and y which are k8s_node and rancher

    1. Check which role the nods have

      sudo salt -C '( G@orch_seq:x or G@orch_seq:y ) grains.get roles
    2. Drain all nodes listed which are k8s_node and rancher:

  2. Upgrade the nodes of the sequence x and y using salt

    sudo salt -C '( ( G@orch_seq:x or G@orch_seq:y ) G@context:xxxxx)' pkg.upgrade