Skip to content

Inconsistent workloads and the pod ready state #129822

Open
@kangzhiqin

Description

@kangzhiqin

What would you like to be added?

k8s version is 1.28.1.

The problem handling process is as follows:

  1. Use STS to create a pod webswingservice.
  2. A CSI plug-in is customized. webswingservice depends on this plug-in.
  3. The first startup process is smooth.
  4. The node is powered off once after startup.
  5. After the system is powered on, the customized CSI fault is not rectified. As a result, volumes fail to be mounted to webswingservice.
  6. The webswingservice does not execute the probe because the volume fails to be attached. Therefore, the phase and conditions.Ready in the pod are still in the ready state.
  7. In this case, the pod status is not ready, but the workload is not updated. As a result, the inconsistency occurs.
  8. The following figure shows the proble
[root@master1 ~]# kubectl get po -A | grep webswingservice
webswingservice-0	0/2	Unknown	47h
webswingservice-1	0/2	Unknown	47h
webswingservice-2	0/2	Unknown	47h
[root@master1 ~]#
[root@master1 ~]#
[root@master1 ~]#
[root@master1 ~]# kubectl get sts -n test webswingservice
NAME	ReADY	AGE
webswingservice	3/3	47h
[root@master1 ~]#

Code snippet:

  1. The probe is executed only when the volume is successfully attached to the SyncPod.

    // Wait for volumes to attach/mount
    if err := kl.volumeManager.WaitForAttachAndMount(ctx, pod); err != nil {
    if !wait.Interrupted(err) {
    kl.recorder.Eventf(pod, v1.EventTypeWarning, events.FailedMountVolume, "Unable to attach or mount volumes: %v", err)
    klog.ErrorS(err, "Unable to attach or mount volumes for pod; skipping pod", "pod", klog.KObj(pod))
    }
    return false, err
    }
    // Fetch the pull secrets for the pod
    pullSecrets := kl.getPullSecretsForPod(pod)
    // Ensure the pod is being probed
    kl.probeManager.AddPod(pod)
    if utilfeature.DefaultFeatureGate.Enabled(features.InPlacePodVerticalScaling) {
    // Handle pod resize here instead of doing it in HandlePodUpdates because
    // this conveniently retries any Deferred resize requests
    // TODO(vinaykul,InPlacePodVerticalScaling): Investigate doing this in HandlePodUpdates + periodic SyncLoop scan
    // See: https://github.com/kubernetes/kubernetes/pull/102884#discussion_r663160060
    if kl.podWorkers.CouldHaveRunningContainers(pod.UID) && !kubetypes.IsStaticPod(pod) {
    pod = kl.handlePodResourcesResize(pod)
    }

  2. The sts checks whether the pod is ready.

    // isRunningAndReady returns true if pod is in the PodRunning Phase, if it has a condition of PodReady.
    func isRunningAndReady(pod *v1.Pod) bool {
    return pod.Status.Phase == v1.PodRunning && podutil.IsPodReady(pod)
    }

Why is this needed?

Is this a problem? Is there any solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/storageCategorizes an issue or PR as relevant to SIG Storage.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions