Kubernetes

9 Notes
+ Installation (July 25, 2020, 10:47 a.m.)

1- Disable SWAP memory: swapoff -a 2- Install Docker using my notes. 3- Install Kubernetes: apt-get install -y apt-transport-https curl curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list deb https://apt.kubernetes.io/ kubernetes-xenial main EOF apt update apt install -y kubelet kubeadm kubectl apt-mark hold kubelet kubeadm kubectl 4- Initialize Kubernetes on Master Node: kubeadm init --pod-network-cidr=10.244.0.0/16 5- Create a Directory for the Kubernetes Cluster: Make kubectl work for your non-root user. mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config 6- Pod Network Add-On (Flannel): Install a pod network add-on so that your pods can communicate effectively. kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml 7-

+ kubeadm, kubelet and kubectl (July 25, 2020, 10:58 a.m.)

- kubeadm: The command to bootstrap the cluster. - kubelet: The component that runs on all of the machines in your cluster and does things like starting pods and containers. - kubectl: The command line util to talk to your cluster. ---------------------------------------------------------------------- kubeadm will not install or manage kubelet or kubectl for you, so you will need to ensure they match the version of the Kubernetes control plane you want kubeadm to install for you. ----------------------------------------------------------------------

+ Stateful and Stateless Application / StateflSet (July 25, 2020, 9:19 a.m.)

What is a Stateful Application? Examples of stateful applications are old databases like (MySQL, Elasticsearch, MySQL, MongoDB, etc) or any application that stores data to keep track of its state. In other words, these are applications that track the state by saving that information in some storage. Stateless applications on the other hand do not keep records of previous interaction in each request or interaction is handled as a completely new isolated interaction entirely based on the information that comes with it. Stateless applications sometimes connect to the stateful applications to forward those requests. ------------------------------------------------------------------------ What is a StatefulSet? It's a Kubernetes component that is used specifically for stateful applications. ------------------------------------------------------------------------ Stateless applications are deployed using the Deployment component. Deployment is an abstraction of Pods and allows you to replicate that application, meaning run two, five, ten identical Pods of the same stateless application in the cluster. ------------------------------------------------------------------------ While stateless applications are deployed using Deployment, stateful applications in the Kubernetes are deployed using StatefulSet component. Just like Deployment, StatefulSet makes it possible to replicate the Stateful app, Pods, or to run multiple replicas of it. ------------------------------------------------------------------------ They both manage Pods that are based on an identical container specification and you can also configure storage with both of them equally, in the same way. If both manage the replication of Pods and also the configuration of data persistence in the same way, the question is what is the difference between those two components? Why we use different ones for each type of application? The differences are listed below. ------------------------------------------------------------------------ The differences between Deployment and StatefulSet: - Replicating stateful applications is more difficult and has a couple of requirements that stateless applications do not have. Example: Let's say we have a MySQL database Pod that handles requests from a Java application, which is deployed using a Deployment component, and let's say we scaled the Java application to 3 Pods so they can handle more client requests. In parallel, we want to scale the MySQL app so we can handle more Java requests as well. Scaling our Java application here is pretty straight-forward. Java applications replicate Pods will be identical and interchangeable, so we can scale it using the deployment pretty easily. The Deployment will create the Pods in any order in any random order, they will get random hashes at the end of the Pod name they will get one service that load balances to any one of the replicate Pods for any request and also when you delete them they get deleted in random order or at the same time. When you scale them down from 3 to 2 replicas, for example, one random replica Pod gets chosen to be deleted. So, no complications there! On the other hand, MySQL Pod replicas can not be created and deleted at the same time in any order and they can't be randomly addressed. The reason for that is because the replica pods are not identical. In fact, they each have their own additional identity on top of the common blueprint of the Pod they get created from, and giving each Pod its own required individual identity is actually what StatefulSet does differently from Deployment. It maintains a sticky identity for each of its Pods and as said earlier, these Pods are created from the same specification but they're not interchangeable. Each has a persistence identifier that maintains across any re-scheduling, meaning, when a Pod dies it gets replaced by new Pod, it keeps that identity. ------------------------------------------------------------------------

+ Architecture (July 21, 2020, 11:16 a.m.)

A basic setup of one node with two application pods running on it: One of the main components of Kubernetes architecture is its worker servers or nodes. Each node will have multiple application pods with containers running on that node. The way Kubernetes does it is by using three processes that must be installed on every node that are used to schedule and manage those pods. So nodes are the cluster servers that do the work. That's why sometimes also called worker nodes. So the first process that needs to run on every node is the container runtime, Docker. So because application pods have containers running inside, a container runtime needs to be installed on every node, but the process that schedules those pods in the containers underneath is Kubelet, which is a process of Kubernetes itself, unlike container runtime that has an interface with both container runtime and the machine, the node itself. Because at the end of the day Kubelet is responsible for taking that configuration and running/starting a pod with a container inside and assigning resources from the node to the container like CPU, RAM, and storage resources. So usually Kubernetes cluster is made of multiple nodes which also must have container runtime and Kubelet services installed. You can have hundreds of those worker nodes which run other pods and containers and replicas of the existing pods. The way the communication between them works is using Services, which is sort of a load balancer that catches the request, directs it to the pod or application, like a database for example, and then forwards it to the respective pod. The third process that is responsible for forwarding requests from services to pods is a Kube Proxy that also must be installed on every node. Kube Proxy has intelligent forwarding logic inside, which makes sure the communication also works in a performant way with low overhead. For example, if an application is making a request to the database, instead of just randomly forwarding the request to any replica, it will actually forward it to the replica that is running on the same node as the pod that initiated the request. Thus is way causes avoiding the network overhead of sending the request to another machine. So, to summarize, two Kubernetes processes, Kubelet and Kube Proxy must be installed on every Kubernetes worker node along with an independent container runtime, in order for Kubernetes cluster to function properly.

+ Namespace - Create component in a Namespace (July 21, 2020, 11:05 a.m.)

kubectl apply -f mysql-configmap.yml --namespace=my-namespace ----------------------------------------------------------------------------- another way is inside the configuration file itself: metadata: namespace: my-namespace ----------------------------------------------------------------------------- kubectl get configmap -n my-namespace -----------------------------------------------------------------------------

+ Namespace - Introduction (July 21, 2020, 9:16 a.m.)

Usages of namespaces: 1- To group resources into namespaces: For example, you can have a database namespace where you deploy your database and all its required resources. You can have a monitoring namespace where you deploy the parameters and all the stuff it needs. You can also have Elastic Stack namespace where all the Elastic Search, Kibaban, and etc resources go together. You can have Nginx-Ingress resources. 2- When you have multiple teams: Imagine the scenario you have two teams that use the same cluster. One team deploys an application which is called "my-app deployment", which has some certain configuration. Now if another team had a deployment that accidentally had the same name as "my-app deployment" but with different configurations, they would override the first team's deployment. To avoid such kind of conflicts again you can use namespaces so that each team can work in their own namespace without disrupting the other. 3-1 Resource sharing: Staging and Development: Let's say you have one cluster and you want to host both the Staging and Development environments in the same cluster. The reason for that is for example if you're using something like Nginx-Ingress Controller or Elastic Stack used for logging for example. You can deploy in one cluster and use it for both environments. In that way, you don't have to deploy these common resources twice in two different clusters. So now the staging can use both resources as well as the development environment. 3-2 Resource Sharing: Blue/Green Deployment: It means that in the same cluster you want to have two different versions of the production. One is the active and in the production now, and another one that is going to be the next production version. The versions of the applications in those Blue and Green production namespaces will be different, however the same as we saw before in the Staging and Development, these namespaces might need to use the same resources like again the Nginx-Ingress Controller or Elastic Stack. In this way again they can both use these common shared resources without having to set-up a separate cluster. 4- Access and Resource Limits of Namespaces: Again we have a scenario where we have two teams working in the same cluster and each one of them has their own namespace. So what you can do in this scenario is that you can give the teams access to only their namespace so they can only be able to create/update/delete resources in their own namespace but they can't do anything in the other namespaces. This way you even restrict or minimize the risk of one team accidentally interfering with another team's work. So each one has its own secured isolated environment. An additional thing that you can do on a namespace level is to limit the resources (CPU, RAM, etc) that each namespace consumes. ------------------------------------------------------------------------- In a Kubernetes cluster, you can organize resources in namespaces, so you can have multiple namespaces in a cluster. You can think of a namespace as a virtual cluster inside a Kubernetes cluster and when you create a cluster, by default, Kubernetes gives you namespaces out of the box. $ kubectl get namespace This command lists the out of the box that Kubernetes offers. - The "kubernetes-dashboard" namespace is shipped automatically in minikube. It's specific to the minikube installation. You will not have this in the standard cluster. - The "kube-system" namespace is not meant for your use. So basically you shouldn't create or modify anything in the kube-system namespace. The components that are deployed in the namespace are: -- System processes -- Master and Kubectl processes - The "kube-public" namespace contains publicly accessible data. It has a config map that contains cluster information that is accessible even without authentication. - The "kube-node-lease" namespace holds information about the heartbeats of nodes. Each node basically gets its own object that contains the information about that node's availability. - The "default" namespace is the one that you're gonna be using to create the resources at the beginning if you haven't created a new namespace. ------------------------------------------------------------------------- You can create new namespaces: kubectl create namespace my-namespace kubectl get namespace -------------------------------------------------------------------------

+ Helm (July 21, 2020, 8:21 a.m.)

Helm has a couple of features that are useful: - Package Manager for Kubernetes (To package YAML files and distribute them in public and private repositories) - Templating Engine - Same applications across different environment - Release management

+ Basic Concepts (July 18, 2020, 2:57 p.m.)

Pod: A pod is the smallest unit that you as a Kubernetes user will configure and interact with. A pod is basically a wrapper of a container. On each worker node, you're gonna have multiple pods and inside of a pod, you can have multiple containers. Usually, per application, you would have one pod, so the only time you would need more than one container inside of a pod is when you have a main application that needs some helper containers. So, usually, you would have one pod per application. A database for example would be one pod, a message broker will be another pod, a server again will be another pod, and your nodeJS application or Java application will have its own pod. Each Pod is its own self-containing server with its own IP address and the way that they can communicate with each other is we're using that internal IP addresses. We don't configure or create containers inside of the Kubernetes cluster but we only work with the Pods which is an abstraction layer over containers. A pod is a component of Kubernetes that manages the containers running inside itself without our intervention. For example, if a container stops or dies inside of a Pod it will be automatically restarted. However, Pods are ephemeral components which means that Pods can also die very frequently and when a Pod dies a new one gets created, and here is where the notion of service comes in to play. So what happens is whenever a Pod gets restarted or recreated, a new Pod is created and it gets a new IP address. So, for example, if you have your application talking to a database Pod using the IP address the Pod has and the Pod restarts, it gets a new IP address, obviously, you'd be very inconvenient adjust the IP address all the time. So, because of that, another component of Kubernetes called Service is used which basically is an alternative or substitute to those IP addresses. So, instead of having these dynamic IP addresses, the services sitting in front of each Pod that talk to each other. So, now if a Pod behind the service dies and gets recreated the service stays in place because their life-cycles are not tied to each other. The Service has two main functionality: 1- An IP address which is a permanent IP address which you can communicate between Pods 2- Load balancer

+ Introduction (July 18, 2020, 1:29 p.m.)

Kubernetes is an open-source platform for managing container technologies such as Docker. Docker lets you create containers for a pre-configured image and application. Kubernetes provides the next step, allowing you to balance loads between containers and run multiple containers across multiple systems. The simplest description of a Kubernetes cluster would be a set of managed nodes that run applications in containers. ----------------------------------------------------------------------------- Kubernetes is an open-source container orchestration tool that was originally developed by Google. On the foundation, it manages containers: Docker containers or from some other technologies. Kubernetes helps you manage containerized applications that are made up of hundreds or thousands of containers and helps you manage them in different environments, like physical machines, virtual machines or cloud environments, or even hybrid development environments. ----------------------------------------------------------------------------- What problems does Kubernetes solve? What are the tasks of an orchestration tool? - The trend from monolith to Microservices. - Increased usages of containers - Demand for a proper way of managing those hundreds of containers. ----------------------------------------------------------------------------- What features do orchestration tools offer? - High Availability or no downtime - Scalability or high performance - Disaster recovery - backup and restore ----------------------------------------------------------------------------- How does the basic Kubernetes architecture look like? The Kubernetes cluster is made up of at least one Master node and then connected to it you have a couple of worker nodes. Each node has a Kublete process running on it. Kublete is a Kubernetes process that makes it possible for the cluster to talk to/communicate with each other and execute some tasks on those nodes, like running application processes. Each worker node has Docker containers of different applications deployed on it. So depending on how the workload is distributed you would have a different number of Docker containers running on worker nodes. ----------------------------------------------------------------------------- What is running on the Master node? Master node actually runs several Kubernetes processes that are absolutely necessary to run and manage the cluster properly. - API Server: One of the processes is an API Server which also is a container. An API Server is actually the entry point to the Kubernetes cluster. This is the process that different Kubernetes clients talk to, like UI, API, CLI. - Controller Manager: Keeps an overview of what's happening in the cluster, whether something needs to be repaired, or maybe if a container died and it needs to be restarted, etc. - Scheduler: The scheduler is basically responsible for scheduling containers on different nodes based on workload and the available server resources on each node. So it's an intelligent process that decides on each worker node the next container should be scheduled on, based on the available resources on those worker nodes and the load that the container needs. - ETCD key-value storage It holds the current state of the Kubernetes cluster at any time. It has all the configuration data inside and all the status data of each node and each container inside of that node. The backup and restore process is actually made from this ETCD snapshots. - Virtual Network Enables worker nodes and master nodes talk to each other. It turns all the nodes inside of the cluster into one powerful machine that has some of all the resources of individual nodes. ----------------------------------------------------------------------------- Worker nodes actually have most load because they're running applications inside of it, they're usually much bigger and have more resources because that will be running hundreds of containers inside of them. The master node will be running just a handful of master processes, so it doesn't need that many resources. However, as you can image, a master node is much more important than the individual worker nodes because if for example, if you lose a master node access, you will not be able to access the cluster anymore, and that means you absolutely have to have a backup of your master at any time, so in production environments usually, you would to at least have two masters inside your Kubernetes cluster. But in more cases, of course, you're gonna have multiple masters, where if one master node is down the cluster continues to function smoothly because other masters are available. -----------------------------------------------------------------------------