Kubernetes - a brief introduction to container orchestration

Posted in development on November 10, 2020 by Adrian Wyssmann ‐ 14 min read

This is a “mini” introduction to Kubernetes - or short k8s - which gives you an idea of what it is and what are the general concepts and terminologies without going into details. It is also not complete and I will cover the different aspects in more detail in separate posts. Hope this post is still helpful for you though.

What is Kubernetes

A long time ago I wrote about docker where I also barely mentioned docker swarm, one of the many container orchestration tool. As of today however, the most known container orchestration tool is kubernetes which basically won over the other tools out there and thus is probably one of the most used orchestration tool.

When using docker or docker-compose commands you usually execute them on a particular node. If that machine is down or breaks, your containers stop running and are not available. Thus you need to use another node and start the container(s) there. This is where container orchestrator come into play by managing this for you. Actually it is much more what they do, for example kubernetes:

automatically schedule the containers based on resource usage and constraints
automatically replace and reschedules the containers from failed nodes
kill and restart the containers which do not respond to health checks, based on existing rules/policy
automatically scale applications based on resource usage like CPU and memory
discover services (group of containers) automatically, and load-balance requests between containers of a given service
automated rollout and rollback of new versions/configurations without introducing any downtime.
manage secrets and configuration details for an application without re-building the respective images
storage orchestration: automatically mount local, external, and storage solutions to the containers
execution of batch jobs

Components of kubernetes

To run kubernetes you usually need more than one node i.e. server. There are also ways to run kubernetes on a single node by using for example minikube. As you can read at kubernetes.io there are several components involved

Component	Description
Cluster	Cluster is a collection of nodes which consists of Control Plane Node and Worker Node
Nodes	An entity (server, vm) which provides resources like cpu, memory, …
Control Plane Node	Nodes on which control plane components are running
Worker Node	Nodes on which the workload i.e. pods (containers) are running

Control Plane

As the name implies the control plane controls the Cluster:

global decisions about the cluster (for example, scheduling), as well as detecting and responding to cluster events (for example, starting up a new pod when a deployment’s replicas field is unsatisfied).

It consists of several components as you can also see in the graph above.

Component	Description
kube-apiserver	The API server which is exposed and act as a front-end
etcd	Key value store for all cluster data
kube-scheduler	Watches for newly created pods with no assigned node, and selects a node for them to run on
kube-controller-manager	Runs controller processes which are- Node controller: Responsible for noticing and responding when nodes go down- Replication controller: Responsible for maintaining the correct number of pods for every replication controller object in the system- Endpoints controller: Populates the Endpoints object (that is, joins Services & Pods)- Service Account & Token controllers: Create default accounts and API access tokens for new namespaces
cloud-controller-manager	embeds cloud-specific control logic to interact with your cloud provider’s API

Checkout Controller, kube-scheduler and etcd for more details.

Worker Nodes

Worker nodes (or nodes) also have several components

Component	Description
kubelet	agent which makes sure that containers are running in a Pod and are healthy
kube-proxy	network proxy which maintains network rules on nodes and implements part of the service concept. kube-proxy uses the operating system packet filtering layer if there is one and it’s available. Otherwise, kube-proxy forwards the traffic itself
container runtime	software that is responsible for running containers which can be `Docker`, `containerd`, `CRI-IO` or any implementation of the Kubernetes CRI (Container Runtime Interface)

Checkout Nodes for more details.

Concepts

Beside of understanding the key components of Kubernetes it’s also important to understand the essential concepts/terms of it. I will birefly introduce them and provide links to the official documentation. My aim here is to give a principal understanding of what these concepts are.

Containers

containers is an immutable entity which contains an application and all it’s dependencies, decoupling it from underlying host infrastructure so that it can run anywhere the same way. My post explains the mechanism behind with example of Docker images.

It’s important to note that Kubernetes does not directly deal with containers but with pods.

Workloads

A workloads is an application running on Kubernetes.

Whether your workload is a single component or several that work together, on Kubernetes you run it inside a set of Pods.
A pod represents a set of running containers on your cluster. A Pod has a defined lifecycle and if a pod fails or terminates on note, that’s it and you would need to create a new pod.

In Kubernetes this is handled automatically by means of workload resources which manage a set of Pods on your behalf:

Deployments

A Deployment is a declarative description of a desired state. The Deployment Controller changes the actual state to the desired state. There are different Use Cases so for example to Create a Deployment to rollout a ReplicaSet. The controller then ensures that the pods are created and checks that they are running. If you want to change that state for example by adding additional containers, you have to update that declaration. An example for such a declarative description is this, which creates a ReplicaSet to bring up three nginx Pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

I don’t go into too much details but you can read it here.

Important to note is that declarations are written usually in yaml - also checkout Manage deployments for more details.

ReplicaSet

The purpose of a ReplicaSet is to ensure a given number of identical pods is running at any time.

Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features. Therefore, we recommend using Deployments instead of directly using ReplicaSets, unless you require custom update orchestration or don’t require updates at all.

StatefulSet

StatefulSets is used to manage stateful applications and also guarantees the ordering and uniqueness of these Pods.

Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.

StatefulSets are valuable for applications that require one or more of the following:

Stable, unique network identifiers
Stable, persistent storage
Ordered, graceful deployment and scaling
Ordered, automated rolling updates

In StatefulSets the N replicas are created sequentially ({0..N-1}) and all predecessors must be Running and Ready. The same applies for deletion which happens in reverse sequential order ({N-1..0}) whereas a Pod is terminated only when all of its successors are completely shutdown.

StatefulSet Pods also have a unique identity that is comprised of:

Component	Description
Ordinal Index	An integer ordinal, from 0 up through N-1, that is unique over the Set
Stable Network ID	The hostname is constructed as `$(statefulset name)-$(ordinal)` and the domain manages by the service as `$(service name).$(namespace).svc.cluster.local`. Each pod itself gets a dns name `$(podname).$(governing service domain)`
Stable Storage	Kubernetes creates one PersistentVolume for each VolumeClaimTemplate. When a Pod is (re)scheduled onto a node, its volumeMounts mount the PersistentVolumes associated with its PersistentVolume Claims
Pod Name Label	Pod gets a label `statefulset.kubernetes.io/pod-name`

There are much more details to know about StatefulSets which I will cover in a later post.

DaemonSet

A DaemonSet ensures that a copy of a pod runs on all nodes. Typically these are daemons for storage, logging or monitoring. A DaemonSet also ensure to start pods on new nodes.

Job

A Job creates one or more pods which should successfully terminate. When a Job completes, no more Pods are created, but the Pods are not deleted either, so you still can view the logs. It can also be that either a container in a Pod fails or the entire pod fails. What happens in these cases depends on the configuration parameters of the job, see Handling Pod and Container Failures for details.

Here an example of a job which computes π to 2000 places and prints it out:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4

CronJob

A CronJob is a job which runs periodically on a given schedule, written in Cron format, based on the timezone of the kube-controller-manager. Below example prints the current time and a hello message every minute:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

There are some limitations so one should take care to properly configure startingDeadlineSeconds and concurrencyPolicy to avoid that two jobs or even none is created.

Networking

The Networking aspect of Kubernetes is quite important, so let’s have a look at the components there.

Service

A [service] is an abstract way to expose an application running on a set of Pods as a network service_. We already know that pods are are nonpermanent resources, which can be dynamically created and destroyed. In addition each pod gets it’s own IP address so they are also not permanent. Therefore you have the service object which targets the set of pods by a selector.

The example below creates a new Service object named my-service, which targets TCP port 9376 on any Pod with the app=MyApp label.

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: MyApp
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

Kubernetes assigns this Service an IP address ("cluster IP"). The controller for the Service selector continuously scans for Pods that match its selector, and then updates the related Endpoint object `my-service".

However there are also services without selectors by providing the Endpoint object yourself

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376
---
apiVersion: v1
kind: Endpoints
metadata:
  name: my-service
subsets:
  - addresses:
      - ip: 192.0.2.42
    ports:
      - port: 9376

There are different ServiceTypes which allow different behaviors - which is important especially if you want to expose Service onto an external IP address, that’s outside of your cluster:

ClusterIP: (default) Exposes the Service on a cluster-internal IP.
NodePort: Exposes the Service on each Node’s IP at a static port so the service is reachable at <NodeIP>:<NodePort>. The port should be in a range of 30000-32767
LoadBalancer: Exposes the Service externally using a cloud provider’s load balancer. NodePort and ClusterIP Services, to which the external load balancer routes, are automatically created.
ExternalName: Maps the Service to the contents of the externalName field (e.g. foo.bar.example.com)

Endpoints

As you have seen above, an [Endpoint] object is usually created by the Service object and contains a network address and port to where the service is running. To my understanding it only supports ip addresses which can be quite limiting if you have a dynamic external endpoint.

EndpointSlices

We have seen above what an endpoint is but as all network endpoints for a Service were stored in a single Endpoints resource, those resources could get quite large. Thus Kubernetes introduced EndpointSlices which contains references to a set of network endpoints. EndpointSlices group network endpoints together by unique combinations of protocol, port number, and Service name.

Ingress

Ingress objects manages external access to the services in a cluster and may provide load balancing and SSL termination.

It requires an Ingress Controller so that the ingress resource can be used. The ingress resource requires rules which define how or which traffic is being routed. Without rules or if none of the rules apply, traffic goes to the default backend.

Ingress Controllers

As heard above Ingress Controllers are required for Ingress resources to work. Kubernetes as a project currently supports and maintains GCE and nginx controllers, but there are additional controllers available

Network Policies

Network Policies allow control of traffic flow at the IP address or port level (OSI layer 3 or 4). By default, pods are non-isolated; they accept traffic from any source. So one can isolate a pod using a network policy which allows you to specify how a pod is allowed to communicate with various network “entities”

Other pods that are allowed
Namespaces that are allowed
IP blocks (exception: traffic to and from the node where a Pod is running is always allowed, regardless of the IP address of the Pod or the node)

Storage

Kubernetes Storage is a very big topic, I will here simply give a quick intro but not go into any details - I will cover that in further posts.

Volumes

We already have seen volumes in Docker an we know a volume is a directory on disk or in another container. Such ephemeral volumes lives as long as the container does, which is not ideal in case you want to persist data. The principle in Kubernetes is the same but volumes are managed by it and they are associated to Pods and not containers. Kubernetes supports a variety of volume types. A pod can use multiple volume types by specifying the StorageClass. A StorageClass describes the types (or “classes”) of storage a cluster offers. This description includes a

provisioner: determines what volume plugin is used for provisioning PVs
reclaim policy: defines what happens to a storage which is either Delete or Retain (Delete is the default for dynamic storage)
parameters: parameters that describe volumes belonging to the storage class (depend on the provisioner)

Ephemeral Volumes

According to Wiktionary means Something which lasts for a short period of time so Ephemeral Volumes are short living volumes - as we already heard above. This is useful when the application (pod) does not care whether that data is stored persistently across restarts example for cached data or read only data like configuration files, etc. Kubernetes supports different kind of such storage:

emptyDir: empty at Pod startup, with storage coming locally from the kubelet base directory (usually the root disk) or RAM
configMap, downwardAP, secret: inject different kinds of Kubernetes data into a Pod
CSI ephemeral volumes: similar to the previous volume kinds, but provided by special CSI drivers which specifically support this feature
generic ephemeral volumes, which can be provided by all storage drivers that also support persistent volumes

Persistent Volumes

Kubernetes provides he possibility to provide and consume Persistent Volumes without the user caring about the details on how to do so. For this purpose there are two resources which facilitate the static or dynamic creation of volumes

A PersistentVolume (PV) is a piece of storage in the cluster which contains details of the real storage
A PersistentVolumeClaim (PVC) is a request for storage by a user. It consumes a PV

The PVC requests storage (PV) for the defined StorageClass which then the cluster tries to dynamically provision. If the StorageClass is not defined ("") the dynamic provision is disabled and thus requires the cluster administrator to create the PV's by herself - this is called static-provisioning.

You can read more about the lifecycle of a volume and claim [here]](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#lifecycle-of-a-volume-and-claim

Configuration

ConfigMaps

ConfigMaps is an object to store non-confidential data in key-value pairs (max. 1MiB). This allows you to decouple environment-specific configuration from your container images so they are portable. An example

apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  # property-like keys; each key maps to a simple value
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"

  # file-like keys
  game.properties: |
    enemy.types=aliens,monsters
    player.maximum-lives=5    
  user-interface.properties: |
    color.good=purple
    color.bad=yellow
    allow.textmode=true

There are four different ways that you can use a ConfigMap to configure a container inside a Pod:

Command line arguments to the entrypoint of a container
Environment variables for a container
Add a file in read-only volume, for the application to read
Write code to run inside the Pod that uses the Kubernetes API to read a ConfigMap

I will cover the exact how in another post, for not it’s just to know that you have some flexibility.

Secrets

Secrets is similar to ConfigMaps but is intended to store sensitive data like passwords.

A Secret can be used with a Pod in three ways:

As files in a volume mounted on one or more of its containers.
As container environment variable.
By the kubelet when pulling images for the Pod.

There are different types of secrets but usually they all contain a data field which contains the secret as a base64-encoded strings. Secrets reside in a namespace and can only be referenced by Pods in that same namespace.

base64-encoding this is not an encryption, just an encoding, so if you can access the secret object you can also extract the data from it. So it’s also not recommended to store secrets in source control. There are other possibilities which ensure that your data is actually encrypted, but thats for another post.

What next

I want to create some subsequent posts covering the different aspects in more detail. Please stay tuned.

Wyssmann Engineering

Title here