Setting up Kubernetes on master and nodes
Kubernetes is used as the fundamental container management system for services like RiseML.
Prerequirements
Before installing Kubernetes itself some prerequirements must be installed.
Docker
Docker is required for the creation and management of all Kubernetes containers.
apt-get update && apt-get install -y docker.io
Up until now, Kubernetes (1.9.3) only supports Docker 17.03 with versions greater than 17.06 marked as may work. To install Docker 17.03 right from the beginning just do the following:
apt-get update apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - add-apt-repository \ "deb https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") \ $(lsb_release -cs) \ stable" apt-get update && apt-get install -y docker-ce=$(apt-cache madison docker-ce | grep 17.03 | head -1 | awk '{print $3}')
Installing Kubernetes
For installing Kubernetes the following has to be done:
apt-get update && apt-get install -y apt-transport-https curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - cat <<EOF >/etc/apt/sources.list.d/kubernetes.list deb http://apt.kubernetes.io/ kubernetes-xenial main EOF
After that all services of Kubernetes can be installed (keep in mind that this must be done on every machine):
apt-get update && apt-get install -y kubelet kubeadm kubectl
Initializing the master
The master is the machine containing the control plane and the API server. To initialize the master just execute the following command on the chosen machine:
kubeadm init [--pod-network-cidr=192.168.0.0]
Note: The parameter given within the parentheses is only necessary when you want to use the Flannel Overlay Network (more on that later)
After that kubeadm will initialize all the required configurations and settings for the master. Within the next minutes you'll get a list of command you'll have to execute as a regular user:
As stated from the initialization of the master:
mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config
Unfortunately this doesn't work sometimes so you can also execute the following instead:
mkdir -p $HOME/.kube sudo cat /etc/kubernetes/admin.conf > $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config
Overlay network
The next statement from the initialization of the master:
You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: http://kubernetes.io/docs/admin/addons/
There were mainly two overlay networks I've tested so far: Weave Net and Flannel. For both networks it is required to pass bridged IPv4 traffic to iptables' chains, so if not already set do the following on all machines:
sysctl net.bridge.bridge-nf-call-iptables=1
After that you can choose one of the mentioned CNI networks.
Flannel
Download the file https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml and replace the IP address under data.net-conf.json.Network with the one chosen in the initialization phase of the master (parameter -pod-network-cidr).
After that the CNI network can be applied to the cluster:
kubectl apply -f kube-flannel.yml
Weave Net
Just execute the following and the network will be up and running after a short moment:
export kubever=$(kubectl version | base64 | tr -d '\n') kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
Configuring the nodes
The last statement of the initialization of the master must the executed on every node that should be added to the cluster:
You can now join any number of machines by running the following on each node as root: kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>
Updates
Checking the upgrade plan on the master
Before updating all machines you'll have to run the following on your master node:
export VERSION=$(curl -sSL https://dl.k8s.io/release/stable.txt) # or manually specify a released Kubernetes version export ARCH=amd64 # or: arm, arm64, ppc64le, s390x curl -sSL https://dl.k8s.io/release/${VERSION}/bin/linux/${ARCH}/kubeadm > /usr/bin/kubeadm chmod a+rx /usr/bin/kubeadm
Caution: Upgrading the kubeadm package on your system prior to upgrading the control plane causes a failed upgrade. Even though kubeadm ships in the Kubernetes repositories, it’s important to install kubeadm manually. The kubeadm team is working on fixing this limitation.
Just to be sure let's check if the version of kubeadm is correct: kubeadm version
Returning something similar to this:
kubeadm version: &version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
After that the following has to be executed on the master machine as well:
kubeadm upgrade plan
kubeadm upgrade plan checks that your cluster is upgradeable and fetches the versions available to upgrade to in an user-friendly way, resulting in CMD outputs like the following:
You can now apply the upgrade by executing the following command: kubeadm upgrade apply v1.9.3
After executing the proposed command kubeadm upgrade apply v1.9.3 it will take a while - and if everything works as expected - you should get an output like:
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.9.3". Enjoy!
But we are not done yet, we must update kubelet and all other packages on the master and the nodes as well
Upgrading your master and node packages
Upgrading to Kubernetes 1.10.x
Don't forget to change your kube-proxy and kubeadm-config according to the https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.10.md#before-upgrading.
Default workflow for every upgrade
For each host (referred to as $HOST below), including the master, in your cluster, upgrade kubelet by executing the following commands. Thereby the names for $HOST are the node names known by kubernetes.
- Prepare the host for maintenance, marking it unschedulable and evicting the workload:
kubectl drain $HOST --ignore-daemonsets
If that's not working, because there are pending nodes with local storage (usually the case for grafana, influxdb and experiments running through RiseML) you can append the flags --delete-local-data and --force.
When running this command against the master host, this error is expected and can be safely ignored (since there are static pods running on the master):
node "titan" already cordoned error: pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): etcd-titan, kube-apiserver-titan, kube-controller-manager-titan, kube-scheduler-titan
- Upgrade the Kubernetes package versions on the $HOST node.
After that the new version of kubelet should run on the node machines. You can check that by executing systemctl status kubelet
Bringing back all nodes kubectl uncordon $HOST
Notes
- Disable the swap on all machines, otherwise you'll be unable to start Kubernetes on the machines. It is possible to disable the errors but it is likely that this fallback will be disable in a future release of Kubernetes
If kubeadm upgrade somehow fails and fails to roll back, for example due to an unexpected shutdown during execution, you can run kubeadm upgrade again as it is idempotent and should eventually make sure the actual state is the desired state you are declaring.
You can use kubeadm upgrade to change a running cluster with x.x.x --> x.x.x with --force, which can be used to recover from a bad state.