Kubernetes Ultimate Notes
Kubernetes Ultimate Notes
ETCDCTL_API=3
--listen-client-url =http://127.0.0.1:2379
--advertise-client-url=http://127.0.0:2379
Kubeapi server:
Authenticates the user
Validates the request
Retrieve data( for get request )
Update etcd ( for post/update request )
Communicates to scheduler and kubelet
Lets see this --authorization-mode=Node,RBAC in the upcoming
explanation
o
--service-account-key-file=/var/lib/kubernetes/service-account.pem
(private key of sa token issuer)
K create token <sa name>
--service-cluster-ip-range=10.32.0.0/24
--service-node-port-range=30000-32767
Note:
Manifests files for static pods are stored in
/etc/kubenretes/manifests/
Normally certificates are stored in /etc/Kubernetes/pki
System service files are stored in /etc/systemd/system
Ps -aux | grep kube-apiserver
kubeControllerManager:
--cluster-cidr =
--cluster-name=
KubeScheduler:
When ever you think of scheduler you must know this concepts.
Taints and tolerations
Node selector, Node affinity, Pod Affinity, there may be anti affinity
concepts as well.
It filters the nodes, ranks the nodes and schedules the pods on to the
nodes.
Note: Every component that needs to communicate with the Kube-
apiserver needs a kube config file ( lets evaluate this)
Kubelet:
It registers the node, creates the pods and monitor the node.
It has some important parameters to remember
--kubeconfig
--container-runtime-endpoint:
--network-plugin:
Kubeproxy:
Services:
Nothing to note but will learn about load balancer/ Ingress in the
upcoming sessions.
Note:
Init container in docker is achieved by linking the container.
Docker run <image id> -link <container name>
Replicaset
No need but there is a replication controller, diff is selector.
K get/delete replicaset
K scale –replicas=2 replicaset <replicaset name>
Or go change the yaml file
K replace -f <replicasetname.yaml>
Deployments:
Upgrading the pods and restoring to the previous version is its
nature apart from replicating and scaling
K set image deploy <deploy name> image=<new image>
K rollout history deploy <deploy name>
K rollout undo deploy <deploy name>
K rollout status deploy <deploy name>
K rollout undo deploy <deploy name> --to-revision= <revision
number>
To see the image of a specific version..
K rollout history deploy <deploy name> --revision 2
Namespaces:
How the objects address each other
<service name>.namespace.svc.cluster.local
<pod IP, replace . by ->.namesapce.pod.cluster.local
We will learn more in coredns networking
The root domain cluster.local is defined in the coredns configuration.
ResourceQuota
We can set ResourceQuota for a namespace specifying the
requests and limits < assuming it for the entire namespace>
reevaluate and even the number of pods that can run in a
namespace( hard setting)
Yes, it’s the aggregated resource for all the pods, even we can restrict
the count of other objects like ds by specifying in the resource quota.
K delete
Scheduling:
nodeName
nodeAffinity:
nodeSelector:
Ingress:
Nothing new to learn about ingress. It is a single point of contact for
our k8’s workloads which offers load balancing, host based/ path
based routing, ssl terminations( which I don’t know).
Ingress controller and its family ( pods and services ) does all the
routing atlast. Be careful with its configuration which can be a
configmap if it is deployed as a pod. ( nginx.conf file or .cd file
something that is created when ingress obj is deployed and attached
to nginx.conf file)
It comes down to
rules:
-host:
http:
paths:
---------
Static Pods:
Environment variable:
You can perform something called exporting, when we talk about
env variable
Export KUBECONFIG=/path to kubeconfig file
Export ETCDCTL_API=3
Can be achieved in pod setting as
Spec:
Image:
Name:
Env:
Name:
Value
…………
valueFrom:
configmapKeyRef:
name:
key:
…………
Configmap
K create configmap <name> --from-literal=key=value/
--from-file=finename.keyname
………
envForm:
configmapRef:
name:
……..
Volumes:
-name:
Configmap:
Name:
Secrets:
K create secret generic <secretname> --from-literal=key=value/
--from-file=filename.key
All the data in created secret is base64 encoded.
Imp linux concepts:
Echo -n “pooja” | base64
Echo -n “durga” | base64
Echo -n “value” | base64 –decode
Note:
There are different types of secrets like “generic”. I am not going into
the context of it. Will see if that is necessary upon going forward.
Apart from everything is very similar to Configmap, creation, using it
as a env variable or using it as a volume.
While revising came to know another type of secret apart from
general is service-account-token
One point to add here is, we create secret of type service
account token to create a non expiring token for service
account ( while creating the secret, service account name
should be mentioned in the annotations) and every data in the
secret is base64 encoded( needs validation)
Note:
Init containers are not shown in k get po command.
Another important note is that every container that a pod creates
should be given a name.
Cordon/uncordon vs drain
K cordon node-name <unschedules>
K uncordon node-name <schedules>
K drain node-name < deletes the pods on the node and unschedules
it>
K get nodes -- you will see the version over there, which is the
cluster version.
Points to Remember..
K8s community supports only 2 versions below the release
version
The version of the nodes is indeed the version of the kubelets
running on the node.
Taking kube-apiserver versions as a base version (x)
Allowed clutser component versions are..
Scheduler, controller-manager ( x /x-1)
Kubelet, kubeProxy ( x/x-1/x-2)
Kubectl ( x+1/x-1/x )
Coredns / etcd are the third party components whose versions
are independent of the k8s version.
Procedure involved in upgrading the Kubernetes cluster.
The bible here k8s documentation, follow it blindly.
The typical steps involves…
Update your apt package itself and see what versions of kubeadm are
available
apt update
apt-cache madison kubeadm
Upgrade the kubeadm and see that it is of the version that you are
interested in.
apt-mark unhold kubeadm && \
apt-get update && apt-get install -y kubeadm=1.27.x-00 && \
apt-mark hold kubeadm
see the plan that kubeadm offers and apply the respective version
plan.
Kubeadm upgrade plan
Kubeadm upgrade apply <version>
Ah! That’s a very long time of taking the test. There are many
important things that I have learned from the test.
Etcd server itself uses some –data-dir which s added as a volume
mount to the container.
Note:
To switch between the clusters just like that
Kubectl config use-context <context name>
Note:
If the etcd is running and same space on the controlplane, then it is
set to setup as stacked etcd.
Ahaaaaaaaa, again I have taken a good amount of time to take this
test. There are lot of points that I have learnt
When ever we take the backup, its good to move to the another
secured server.
Scp file.snapshot student-server:
It moves the file from current server to student-server root directory.
Service Account:
It is the topic that you have to pay the attention bcz, it has
undergone different changes right after v1.19..
Old way of working:
Role and RoleBindings are for users.
ServiceAccounts are for the services who are accessing the cluster.
K create sa <name>
There will be a secret inside the created sa.
Describe the secret to fetch the token and give it to the service.
If the services are inside the cluster, the token is mounted as a
volume.
Note:
When a cluster is bought up, a default sa is created and this sa is
given to the pod which has limited privileges. If you want to assign
your own service account..
Spec:
Containers:
Serviceaccountname:
The token is then mounted as a volume typically under
/var/run/secrets/Kubernetes.io/tokens ..
This poses a security risk because, the token is same for all the pods
in the namespace and it has no expiry date. Also, additional effort for
creating secrets to get the token
Note:
Spec:
Serviceaccountname:
automountdefaultServiceAccount: False
is also possible to add our own serviceAccount by disabling the
automount.
Needs to be given.
This secret should be invoked as..
Spec:
Container:
imagePullSecret:
-name: <secret name>
Security in running a container in docker followed the Kubernetes.
Or
Spec:
template:
conatainers:
securityContext:
runAsUser:
privileges:
NetworkPolicies:
Nothing new about network policies but there are few points to
remember.
The ingress rule is configured for the request. We need not to create
egress rule for the reply. We only needs to configure egress rule if
there is a request intrun.
Policy:
-ingress
-egress. It works this way.
Ingress:
From:
-IPblock:
Ports:
Protocol:
port
Egress:
to:
This is the structure that you should remember.
Note:
Another beautiful command that helps to debug is
Crictl ps -a
Crictl ps <cont id> --previous.
It it is a problem with connection, normally it the expiration date of
the certificate or the certificates are not mounted correctly.
Note:
While creating csr, “signer name” should be mentioned. Look for the
default signer names in the documentation.
“usage” also needs to be mentioned. Look for the allowable values in
yaml file. Like, client.auth or server.auth
Another trick with CSR is that, if we want to see want groups the user
is requesting access for can only be seen in yaml format. Describe
command will not show these details.
Note:
You can only check the auth for the namespaced resources.
If you stuck at what resource name to use, use the one in the get
command.
K explain storageclass #will give all the details along with its api-
group.
Verbs: [“*”] #possible
Note:
It is always better to look for the info in yaml format when we can’t
see it in describe command like serviceaccounts used by the pod.
Note:
The rolebinding can take user as “ServiceAccount” for the sa object
Manually token can be created without saAdmissionController using
RequestToken API as
K create token <sa name>
Note:
autoMountServiceAccountToken is to disable the auto mount of
token to pod by ServiceAccountAdmissionController but not the sa
itself.
securityContexts: also take the capabilities field.
Beautiful, you can just get the manifest file structure by
K get networkpolicy -o yaml
Note:
Carefully look at the indentations when creating networkPolicies.
Storage:
Nothing new about PV ( hotsPath/awsElasticBlockStore( FSType), )
about an understanding of access modes.
ReadWriteOnce ( Volume can be Read and Write by a single node,
however, all the pods in the same node can read and write to it.
ReadWriteMany ( Volume can be read and written by many pods)
ReadOnlyMany( Volume can be mounted as ReadOnly Mode by many
nodes)
ReadWriteOncePod ( Only one pod can read and write to it. No other
pod on any other node.)
This is the order which pvc checks for its binding to pv.
Sufficient Capacity
Access Modes
Volume Modes
Storage Class
Selector
Note: Not sure about Volume Modes and Selector. Will get to know.
Selector: A PV can select a PVC based on the labels of the pv.
One of the VolumeModes is FileSystem.
Note: If the critirea doesn’t match, the pvc will be in pending state.
So, for a PV, Capacity, access modes and the type are important.
There is something called, PersistentVolumeReclaimPolicy. It can be
Available, Bound, Released, Failed. --- these are states.
The “Recycle” reclaim policy is deprecated. But we can create a
custom pod which deletes the data in the volume.
Note:
StatefulSets are out of CKA. They are important. You can learn.
StorageClasses:
When a PV is created which is reserving the storage from google
cloud, first google cloud bucket needs to be created then you can
create a PV specifying the bucket name. This is called Static
Provisioning.
But a Stoarge class ( defining provisioners ) will create the bucket
using provisioning plugins when ever the StoargeClass has been
created. When a PVC tries to claim the storage mentioning
(StorageClassName) it automatically creates a PV.
Note:
PersistentVolumeRecliamPolicy: Retain
We can’t delete the PVC if the pod is using the claim.
We need to first delete the pod and the PVC.
The PV will come o release state only after PVC is deleted.
When we talk about the dns, the two helpful commands to get
the dns details are..
Nslookup <IP address>
Dig <IP address>
When we talk about Core DNS, the imp conf file we should
know is /etc/Core file. Which has info for where to lookup for
DNS entries.
/etc/hosts/
Note:
Even another server can act as router. That said, the data transfer
b/w two interfaces is possible only
/proc/sys/net/IPv4/IP_forwarding =1 or /etc/sysctl.conf,
ip_forwarding=1
/etc/resolv.conf
Nameserver <ip address>
Search <web.com> <cloud.web.com>
DNS entries:
A type
AAAA type
Canonical entries.
NAT Tables:
Network Address Translator.
It will allow your hosts in your private network to reach the public
network, using a NAT router( it can be another linux machine as well)
How!
There is connection tracking table in /proc/net/nettrack,
Which defines what is the source address, what is the destination
address, what is the source port and what is the dest port.
What we call as a NAT entries
Iptables -t NAT -A POSTROUTING -s <ip address > -J MASQUERADE
Will add an entry to NAT table of type MASQUERADE.
Iptables -t NAT -L -v -n
#will list the entries in the NAT table
There are other type of entries as well SNAT, DNAT…
Note:
A LAN can be divided into different VLANs ( like diff subnets) to
improve efficiency, security and performance.
Ip link
Ip netns add red
Ip netns add blue
Ip netns add veth-red type veth peer name veth-blue
# setting up a virtual thread.
Ip netns set veth-red netns red
Ip netns set veth-blue netns blue
Ip exec netns red ip addr add <ip address> dev veth-red
Ip exec netns blue ip addr add <ip address> dev veth-blue
#attaching the thread to ns and giving them the address.
Ip -n blue link set veth-blue up
Ip -n red link set veth-red up
#bringing the interfaces up.
If there are more than two network namesapces, how a bridge
network is configured
Create virtual threads and attach on end to all the namespaces and
other ends to a common switch ( which is considered as an interface
to the system)
Ip link add veth-red type veth peer name veth-red-brd
Ip link set veth-red netns red
Ip link set veth-red-brd master v-net-0
ip -n red add addr <ip address > veth-red
Since v-net-0 is an interface to the system, how do we add the
address..
Ip addr add <address> dev v-net-0
How the traffic on the virtual interface reach the physical switch
It reaches the physical switch, by using physical interface as a
gateway.
Ip route add <switch address> via <physical interface>
To transfer the packet from physical interface to virtual interface, IP-
forwarding should be enabled.
When the traffic goes out of the physical address, rewrite the source
address to virtual Ip address, so that the reply traffic will know the
exact destination
IP rewriting is done in NAT table
Iptables -t NAT -A POSTROUTING -s < virtual interface IP> -J
MASQUERADE
Note:
Ip link del veth
There is this concept of port forwarding in docker
Docker run <image name> --port 8080:80
It means that any request to host potr 8080 is forwarded to
conatine report 80.
How docker does this
Docker creates a DNAT rule, to replace the destination IP address to
container IP address.
Iptables -t NAT -A PREROUTING -sport 8080 -dport 80 -destination-
address <container address> -J DNAT.
Note:
All these instructions are packed with a standard set of rules and a
container network interface can be developed.
Docker run –network none <image name>
Bridge add <cont id> /var/run/netns/<contid>
Note:
Docker doesn’t support CNI. Bcz it build way before it. There is a
workaround in k8s to use the docker as a container runtime but it is
no more supported( crictl is used typically)
So “Conatiner run time” creates the container, “Conatiner network
interface” creates the networking for the container.
--------------------------------
inpsect kube-api-server
---------------------
------------
what is the default gateway configured for the pods to reach the
other pods
pods try to connect to the other pods through weave interface. Try to
find a answer by it.
Display all the sokcets along with its program name
Netstat -anp | grep etcd
Note:
It is important that, whatever cni plugin is configured, the
corresponding network solution should be installed.
Note:
The default gateway for the pods on any node is the bridge interface
created by the cni service.
Note:
Why do we need to mention all the available etcd servers to kube-
apiserver.
Etcd is a distributed system, we can access through any of the
available servers and they themselves elect the leader to update the
database.
Some points to know about etcd :
Etcd uses raft protocol and works in leader-election mode.
The minimum number of nodes required for the HA functionality,
what is called as a quoram is
(n/2)+1 .
Just a note:
Vagrant tool is a VM provisioning tool.
The process of setting up the kubernetescluster goes like this.
Bringing up the VMs using vagrant.
Install the container runtime.
Install the kubeadm
Initialize master
Set up the pod network
Join Workers.
Indetail steps:
After bringing up the VMs ..
Before installing the container runtime, know what linux distribution
you are running.
Set the ip_forwarding values to 1 and bring up all the necessary
modules required to set the network tracking mechanisms.
[Follow the documentation]
After that install the container run time. Easy way is to use apt-get
package tool. Try to install only the container runtime but no other
modules.
[ Follow documentation, which takes you to the github repository ]
Application failure:
Same names, endpoints, ports, environment variables.
Controlplane failure:
Mostly look at the logs of the container for detailed verbose.
Wrong executable, wrong mounting etc..
Worker Node failure:
Mostly with the kubelets
Carefully look at the journalctl -u kubelet logs
Go one after one. Carefully look at all the kubelet config files.
Network failure:
Issues with CNI. Like CNI not installed.
Kube-proxy issues, carefully look at kubeProxy daemonsets, values in
configmap should match that in ds manifest file( unlike volumes,
where we see if the files exist or not, it abot value)
Note:
Root element in a json path is $
$.car.color
$.vehicles.color
Any ouput of the json path is a LIST []
If this is a List
[
-pooja
-durga
]
$[0]
$[?( @.model == rear right )]
$[2] >> $[?(@ > 5)] >> print all the values on in the list.
To allow traffic from all the sources, the ingress object can be left
empty.
During troubleshooting, its good to check starting from the name.
We can’t debug if the container itself is not created.
Why initContainer is going to the crash loop back off error
From version 1.28 the side car container is implemented as a init
conatainer, by setting up the restart policy as always.
Before 1.28, it is implemented as a main container with all the values
So when looking at the docs look for the correct version of the docs.
Volumes:
-name: v1
ConfigMap: <name>
volumeMounts:
- Name: v1
mountPath: <diff mount path>
- Name: v1
mountPath: <diff mount path>
It works like..
/var/lib/kubelet/pods/<pod ID>/volumes/mountpath/value
/var/lib/kubelet/pods/<pod ID>/mountpath/value