100% found this document useful (3 votes)
900 views

Kubernetes CKA 1000 Troubleshooting

The document outlines troubleshooting steps for a Kubernetes control plane failure, including checking the status of nodes and pods, listing control plane pods and their status, and checking the status of control plane services like kube-apiserver, kube-controller-manager, and kube-scheduler. Logs from the kube-apiserver control plane pod are also examined to help diagnose the issue.

Uploaded by

Sepehr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
900 views

Kubernetes CKA 1000 Troubleshooting

The document outlines troubleshooting steps for a Kubernetes control plane failure, including checking the status of nodes and pods, listing control plane pods and their status, and checking the status of control plane services like kube-apiserver, kube-controller-manager, and kube-scheduler. Logs from the kube-apiserver control plane pod are also examined to help diagnose the issue.

Uploaded by

Sepehr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Course Objectives

Core Concepts

Scheduling
Logging Monitoring

Application Lifecycle Management

Cluster Maintenance

Security

Storage

Networking

Installation, Configuration & Validation

Troubleshooting

Application Failure Worker Node Failure


Control Plane Failure Networking
3

Application
Failure
Check Service Status
curl http://web-service-ip:node-port
curl: (7) Failed to connect to web-service-ip port node-port: Connection timed out

kubectl describe service web-service WEB-Service


Name: web-service
Namespace: default apiVersion: v1
Labels: <none> kind: Pod
Annotations: <none> metadata:
Selector: name=webapp-mysql name: webapp-mysql
Type: NodePort labels:
IP: 10.96.0.156 app: example-app WEB
Port: <unset> 8080/TCP name: webapp-mysql
TargetPort: 8080/TCP spec:
NodePort: <unset> 31672/TCP containers:
Endpoints: 10.32.0.6:8080 - name: webapp-mysql
Session Affinity: None image: simple-webapp-mysql
External Traffic Policy: Cluster ports: DB-Service
Events: <none> - containerPort: 8080

DB
Check Service
curl http://web-service-ip:node-port
curl: (7) Failed to connect to web-service-ip port node-port: Connection timed out

kubectl describe service web-service WEB-Service


Name: web-service
Namespace: default apiVersion: v1
Labels: <none> kind: Pod
Annotations: <none> metadata:
Selector: name=webapp-mysql name: webapp-mysql
Type: NodePort labels:
IP: 10.96.0.156 app: example-app WEB
Port: <unset> 8080/TCP name: webapp-mysql
TargetPort: 8080/TCP spec:
NodePort: <unset> 31672/TCP containers:
Endpoints: 10.32.0.6:8080 - name: webapp-mysql
Session Affinity: None image: simple-webapp-mysql
External Traffic Policy: Cluster ports: DB-Service
Events: <none> - containerPort: 8080

DB
Check POD
kubectl get pod
NAME READY STATUS RESTARTS AGE
Web 1/1 Running 5 50m

kubectl describe pod web WEB-Service


...

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52m default-scheduler Successfully assigned webapp-mysql to worker-1
Normal Pulling 52m kubelet, worker-1 pulling image "simple-webapp-mysql"
Normal Pulled 52m kubelet, worker-1 Successfully pulled image "simple-webapp-mysql" WEB
Normal Created 52m kubelet, worker-1 Created container
Normal Started 52m kubelet, worker-1 Started container

kubectl logs web -f --previous


10.32.0.1 - - [01/Apr/2019 12:51:55] "GET / HTTP/1.1" 200 - DB-Service
10.32.0.1 - - [01/Apr/2019 12:51:55] "GET /static/img/success.jpg HTTP/1.1" 200 -
10.32.0.1 - - [01/Apr/2019 12:51:55] "GET /favicon.ico HTTP/1.1" 404 -
10.32.0.1 - - [01/Apr/2019 12:51:57] "GET / HTTP/1.1" 200 -
10.32.0.1 - - [01/Apr/2019 12:51:57] "GET / HTTP/1.1" 200 -
10.32.0.1 - - [01/Apr/2019 12:51:58] "GET / HTTP/1.1" 200 - DB
10.32.0.1 - - [01/Apr/2019 12:51:58] "GET / HTTP/1.1" 200 –
10.32.0.1 - - [01/Apr/2019 12:51:58] "GET / HTTP/1.1" 400 – Some Database Error application exiting!
Check Dependent Service

WEB-Service

WEB

DB-Service

DB
Check Dependent Applications

WEB-Service

WEB

DB-Service

DB
Course Objectives
Core Concepts

Scheduling
Logging Monitoring

Application Lifecycle Management

Cluster Maintenance

Security

Storage

Networking

Installation, Configuration & Validation

Troubleshooting

Application Failure Worker Node Failure


Control Plane Failure Networking
11

Control Plane
Failure
Check Node Status
kubectl get nodes
NAME STATUS ROLES AGE VERSION
worker-1 Ready <none> 8d v1.13.0
worker-2 Ready <none> 8d v1.13.0

kubectl get pods


NAME READY STATUS RESTARTS AGE
mysql 1/1 Running 0 113m
webapp-mysql 1/1 Running 0 113m
Check Controlplane Pods
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-5dntv 1/1 Running 0 1h
coredns-78fcdf6894-knpzl 1/1 Running 0 1h
etcd-master 1/1 Running 0 1h
kube-apiserver-master 1/1 Running 0 1h
kube-controller-manager-master 1/1 Running 0 1h
kube-proxy-fvbpj 1/1 Running 0 1h
kube-proxy-v5r2t 1/1 Running 0 1h
kube-scheduler-master 1/1 Running 0 1h
weave-net-7kd52 2/2 Running 1 1h
weave-net-jtl5m 2/2 Running 1 1h
Check Controlplane Services
service kube-apiserver status
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/etc/systemd/system/kube-apiserver.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 07:57:25 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 15767 (kube-apiserver)
Tasks: 13 (limit: 2362)

service kube-controller-manager status


● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 07:57:25 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 15771 (kube-controller)
Tasks: 10 (limit: 2362)

service kube-scheduler status


● kube-scheduler.service - Kubernetes Scheduler
Loaded: loaded (/etc/systemd/system/kube-scheduler.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-03-29 01:45:32 UTC; 11min ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 28390 (kube-scheduler)
Tasks: 10 (limit: 2362)
Check Controlplane Services
service kubelet status
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 14:22:06 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 1281 (kubelet)
Tasks: 24 (limit: 1152)

service kube-proxy status


● kube-proxy.service - Kubernetes Kube Proxy
Loaded: loaded (/etc/systemd/system/kube-proxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 14:21:54 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 794 (kube-proxy)
Tasks: 7 (limit: 1152)
Check Service Logs
kubectl logs kube-apiserver-master -n kube-system
I0401 13:45:38.190735 1 server.go:703] external host was not specified, using 172.17.0.117
I0401 13:45:38.194290 1 server.go:145] Version: v1.11.3
I0401 13:45:38.819705 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order:
NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I0401 13:45:38.819741 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order:
LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I0401 13:45:38.821372 1 plugins.go:158] Loaded 8 mutating admission controller(s) successfully in the following order:
NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,Priority,DefaultTolerationSeconds,DefaultStorageClass,MutatingAdmissionWebhook.
I0401 13:45:38.821410 1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order:
LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
I0401 13:45:38.985453 1 master.go:234] Using reconciler: lease
W0401 13:45:40.900380 1 genericapiserver.go:319] Skipping API batch/v2alpha1 because it has no resources.
W0401 13:45:41.370677 1 genericapiserver.go:319] Skipping API rbac.authorization.k8s.io/v1alpha1 because it has no resources.
W0401 13:45:41.381736 1 genericapiserver.go:319] Skipping API scheduling.k8s.io/v1alpha1 because it has no resources.

sudo journalctl -u kube-apiserver


Mar 20 07:57:25 master-1 systemd[1]: Started Kubernetes API Server.
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.553377 15767 flags.go:33] FLAG: --address="127.0.0.1"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558273 15767 flags.go:33] FLAG: --admission-control="[]"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558325 15767 flags.go:33] FLAG: --admission-control-config-file=""
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558339 15767 flags.go:33] FLAG: --advertise-address="192.168.5.11"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558353 15767 flags.go:33] FLAG: --allow-privileged="true"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558365 15767 flags.go:33] FLAG: --alsologtostderr="false"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558413 15767 flags.go:33] FLAG: --anonymous-auth="true"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558425 15767 flags.go:33] FLAG: --api-audiences="[]"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558442 15767 flags.go:33] FLAG: --apiserver-count="3"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558454 15767 flags.go:33] FLAG: --audit-dynamic-configuration="false"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558464 15767 flags.go:33] FLAG: --audit-log-batch-buffer-size="10000"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558474 15767 flags.go:33] FLAG: --audit-log-batch-max-size="1"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558484 15767 flags.go:33] FLAG: --audit-log-batch-max-wait="0s"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558495 15767 flags.go:33] FLAG: --audit-log-batch-throttle-burst="0"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558504 15767 flags.go:33] FLAG: --audit-log-batch-throttle-enable="false"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558514 15767 flags.go:33] FLAG: --audit-log-batch-throttle-qps="0"
Mar 20 07:57:25 master-1 kube-apiserver[15767]: I0320 07:57:25.558528 15767 flags.go:33] FLAG: --audit-log-format="json"
Course Objectives
Core Concepts

Scheduling
Logging Monitoring

Application Lifecycle Management

Cluster Maintenance

Security

Storage

Networking

Installation, Configuration & Validation

Troubleshooting

Application Failure Worker Node Failure


Control Plane Failure Networking
19

Worker Node
Failure
Check Node Status
kubectl get nodes
NAME STATUS ROLES AGE VERSION
worker-1 Ready <none> 8d v1.13.0
worker-2 NotReady <none> 8d v1.13.0

kubectl describe node worker-1


...
Conditions:
Type Status LastHeartbeatTime Reason Message
---- ------ ----------------- ------ -------
OutOfDisk False Mon, 01 Apr 2019 14:30:33 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Mon, 01 Apr 2019 14:30:33 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 01 Apr 2019 14:30:33 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 01 Apr 2019 14:30:33 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 01 Apr 2019 14:30:33 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled

kubectl describe node worker-1


...
Conditions:
Type Status LastHeartbeatTime Reason Message
---- ------ ----------------- ------ -------
OutOfDisk Unknown Mon, 01 Apr 2019 14:20:20 +0000 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure Unknown Mon, 01 Apr 2019 14:20:20 +0000 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Mon, 01 Apr 2019 14:20:20 +0000 NodeStatusUnknown Kubelet stopped posting node status.
PIDPressure False Mon, 01 Apr 2019 14:20:20 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready Unknown Mon, 01 Apr 2019 14:20:20 +0000 NodeStatusUnknown Kubelet stopped posting node status.
Check Node
top
top - 14:43:56 up 3 days, 19:02, 1 user, load average: 0.35, 0.29, 0.21
Tasks: 112 total, 1 running, 72 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.9 us, 1.7 sy, 0.1 ni, 94.3 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 1009112 total, 74144 free, 736608 used, 198360 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 129244 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND


34 root 20 0 0 0 0 S 5.9 0.0 0:13.14 kswapd0
28826 999 20 0 1361320 383208 3596 S 5.9 38.0 0:46.95 mysqld
1 root 20 0 78260 5924 3192 S 0.0 0.6 0:21.88 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H

df -h
Filesystem Size Used Avail Use% Mounted on
udev 481M 0 481M 0% /dev
tmpfs 99M 1000K 98M 1% /run
/dev/sda1 9.7G 5.3G 4.5G 55% /
tmpfs 493M 0 493M 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 493M 0 493M 0% /sys/fs/cgroup
tmpfs 99M 0 99M 0% /run/user/1000
Check Kubelet Status
service kubelet status
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2019-03-20 14:22:06 UTC; 1 weeks 1 days ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 1281 (kubelet)
Tasks: 24 (limit: 1152)

sudo journalctl –u kubelet


-- Logs begin at Wed 2019-03-20 05:30:37 UTC, end at Mon 2019-04-01 14:42:42 UTC. --
Mar 20 08:12:59 worker-1 systemd[1]: Started Kubernetes Kubelet.
Mar 20 08:12:59 worker-1 kubelet[18962]: Flag --tls-cert-file has been deprecated, This parameter should be set via the config file specified by
the Kubele
Mar 20 08:12:59 worker-1 kubelet[18962]: Flag --tls-private-key-file has been deprecated, This parameter should be set via the config file
specified by the
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.915179 18962 flags.go:33] FLAG: --address="0.0.0.0"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918149 18962 flags.go:33] FLAG: --allow-privileged="true"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918339 18962 flags.go:33] FLAG: --allowed-unsafe-sysctls="[]"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918502 18962 flags.go:33] FLAG: --alsologtostderr="false"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918648 18962 flags.go:33] FLAG: --anonymous-auth="true"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918841 18962 flags.go:33] FLAG: --application-metrics-count-limit="100"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.918974 18962 flags.go:33] FLAG: --authentication-token-webhook="false"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919096 18962 flags.go:33] FLAG: --authentication-token-webhook-cache-ttl="2m0s"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919299 18962 flags.go:33] FLAG: --authorization-mode="AlwaysAllow"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919466 18962 flags.go:33] FLAG: --authorization-webhook-cache-authorized-ttl="5m0s"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919598 18962 flags.go:33] FLAG: --authorization-webhook-cache-unauthorized-ttl="30s"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919791 18962 flags.go:33] FLAG: --azure-container-registry-config=""
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.919971 18962 flags.go:33] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
Mar 20 08:12:59 worker-1 kubelet[18962]: I0320 08:12:59.920102 18962 flags.go:33] FLAG: --bootstrap-checkpoint-path=""
Check Certificates
openssl x509 -in /var/lib/kubelet/worker-1.crt -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
ff:e0:23:9d:fc:78:03:35
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = KUBERNETES-CA
Validity
Not Before: Mar 20 08:09:29 2019 GMT
Not After : Apr 19 08:09:29 2019 GMT
Subject: CN = system:node:worker-1, O = system:nodes
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:b4:28:0c:60:71:41:06:14:46:d9:97:58:2d:fe:
a9:c7:6d:51:cd:1c:98:b9:5e:e6:e4:02:d3:e3:71:
58:a1:60:fe:cb:e7:9b:4b:86:04:67:b5:4f:da:d6:
6c:08:3f:57:e9:70:59:57:48:6a:ce:e5:d4:f3:6e:
b2:fa:8a:18:7e:21:60:35:8f:44:f7:a9:39:57:16:
4f:4e:1e:b1:a3:77:32:c2:ef:d1:38:b4:82:20:8f:
11:0e:79:c4:d1:9b:f6:82:c4:08:84:84:68:d5:c3:
e2:15:a0:ce:23:3c:8d:9c:b8:dd:fc:3a:cd:42:ae:
5e:1b:80:2d:1b:e5:5d:1b:c1:fb:be:a3:9e:82:ff:
a1:27:c8:b6:0f:3c:cb:11:f9:1a:9b:d2:39:92:0e:
47:45:b8:8f:98:13:c6:4d:6a:18:75:a4:01:6f:73:
f6:f8:7f:eb:5d:59:94:46:d8:da:37:75:cf:27:0b:
39:7f:48:20:c5:fd:c7:a7:ce:22:9a:33:4a:30:1d:
95:ef:00:bd:fe:47:22:42:44:99:77:5a:c4:97:bb:
37:93:7c:33:64:f4:b8:3a:53:8c:f4:10:db:7f:5f:
2b:89:18:d6:0e:68:51:34:29:b1:f1:61:6b:4b:c6:
Course Objectives
Core Concepts

Scheduling
Logging Monitoring

Application Lifecycle Management

Cluster Maintenance

Security

Storage

Networking

Installation, Configuration & Validation

Troubleshooting

Application Failure Worker Node Failure


Control Plane Failure Networking
26

Network Failures

You might also like