AD Research Wiki:

Setting up Kubernetes on master and nodes

Kubernetes is used as the fundamental container management system for services like RiseML.

Prerequirements

Before installing Kubernetes itself some prerequirements must be installed.

Docker

Docker is required for the creation and management of all Kubernetes containers.

apt-get update && apt-get install -y docker.io

Up until now, Kubernetes (1.9.3) only supports Docker 17.03 with versions greater than 17.06 marked as may work. To install Docker 17.03 right from the beginning just do the following:

apt-get update
apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository \
   "deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \
   $(lsb_release -cs) \
   stable"
apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')

Installing Kubernetes

For installing Kubernetes the following has to be done:

apt-get update && apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

After that all services of Kubernetes can be installed (keep in mind that this must be done on every machine):

apt-get update && apt-get install -y kubelet kubeadm kubectl

Initializing the master

The master is the machine containing the control plane and the API server. To initialize the master just execute the following command on the chosen machine:

kubeadm init [--pod-network-cidr=192.168.0.0]

Note: The parameter given within the parentheses is only necessary when you want to use the Flannel Overlay Network (more on that later)

After that kubeadm will initialize all the required configurations and settings for the master. Within the next minutes you'll get a list of command you'll have to execute as a regular user:

As stated from the initialization of the master:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Unfortunately this doesn't work sometimes so you can also execute the following instead:

  mkdir -p $HOME/.kube
  sudo cat /etc/kubernetes/admin.conf > $HOME/.kube/config
  chown $(id -u):$(id -g) $HOME/.kube/config

Overlay network

The next statement from the initialization of the master:

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

There were mainly two overlay networks I've tested so far: Weave Net and Flannel. For both networks it is required to pass bridged IPv4 traffic to iptables' chains, so if not already set do the following on all machines:

sysctl net.bridge.bridge-nf-call-iptables=1

After that you can choose one of the mentioned CNI networks.

Flannel

Download the file https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml and replace the IP address under data.net-conf.json.Network with the one chosen in the initialization phase of the master (parameter -pod-network-cidr).

After that the CNI network can be applied to the cluster:

kubectl apply -f kube-flannel.yml

Weave Net

Just execute the following and the network will be up and running after a short moment:

export kubever=$(kubectl version | base64 | tr -d '\n')
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"

Configuring the nodes

The last statement of the initialization of the master must the executed on every node that should be added to the cluster:

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>

Updates

Checking the upgrade plan on the master

Before updating all machines you'll have to run the following on your master node:

export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt) # or manually specify a released Kubernetes version
export ARCH=amd64 # or: arm, arm64, ppc64le, s390x
curl -sSL https://dl.k8s.io/release/${VERSION}/bin/linux/${ARCH}/kubeadm > /usr/bin/kubeadm
chmod a+rx /usr/bin/kubeadm

Caution: Upgrading the kubeadm package on your system prior to upgrading the control plane causes a failed upgrade. Even though kubeadm ships in the Kubernetes repositories, it’s important to install kubeadm manually. The kubeadm team is working on fixing this limitation.

Just to be sure let's check if the version of kubeadm is correct: kubeadm version

Returning something similar to this:

kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

After that the following has to be executed on the master machine as well:

kubeadm upgrade plan

kubeadm upgrade plan checks that your cluster is upgradeable and fetches the versions available to upgrade to in an user-friendly way, resulting in CMD outputs like the following:

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.9.3

After executing the proposed command kubeadm upgrade apply v1.9.3 it will take a while - and if everything works as expected - you should get an output like:

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.9.3". Enjoy!

But we are not done yet, we must update kubelet and all other packages on the master and the nodes as well

Upgrading your master and node packages

Upgrading to Kubernetes 1.10.x

Don't forget to change your kube-proxy and kubeadm-config according to the https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#before-upgrading.

Default workflow for every upgrade

For each host (referred to as $HOST below), including the master, in your cluster, upgrade kubelet by executing the following commands. Thereby the names for $HOST are the node names known by kubernetes.

kubectl drain $HOST --ignore-daemonsets

If that's not working, because there are pending nodes with local storage (usually the case for grafana, influxdb and experiments running through RiseML) you can append the flags --delete-local-data and --force.

When running this command against the master host, this error is expected and can be safely ignored (since there are static pods running on the master):

node "titan" already cordoned
error: pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): etcd-titan, kube-apiserver-titan, kube-controller-manager-titan, kube-scheduler-titan

After that the new version of kubelet should run on the node machines. You can check that by executing systemctl status kubelet

Notes

AD Research Wiki: HowTos/Kubernetes (last edited 2018-04-11 14:23:30 by Markus Näther)