Skip to content

Kubernetes controller manager may fail to manage deployment rollout with minReadySeconds and pod restarts #108266

@rtheis

Description

@rtheis

What happened?

kubectl rollout restart deployment alpine ; kubectl rollout status deployment alpine

may exceed the deployment rollout progress deadlines if the deployment sets minReadySeconds and a new pod restarts during the rollout before the minimum ready seconds.

What did you expect to happen?

Deployment rollout is successful.

How can we reproduce it (as minimally and precisely as possible)?

Apply the following deployment on a 3 node cluster. We are using 3 nodes here to simulate a multi zone cluster that spreads pods across zones.

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: alpine
  name: alpine
spec:
  minReadySeconds: 30
  replicas: 3
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      app: alpine
  strategy:
    rollingUpdate:
      maxSurge: 3
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: alpine
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values: ["alpine"]
            topologyKey: kubernetes.io/hostname
      containers:
      - image: alpine:latest
        imagePullPolicy: IfNotPresent
        name: alpine
        command: ["sh", "-c", "if [[ ! -e /tmp/okay ]]; then touch /tmp/okay; sleep 15; exit; else sleep 100000; fi"]
        volumeMounts:
        - mountPath: /tmp
          name: tmp
      volumes:
      - name: tmp
        emptyDir: {}

then run kubectl rollout restart deployment alpine ; kubectl rollout status deployment alpine.

Anything else we need to know?

Removing minReadySeconds from the example deployment will yield a successful rollout. In addition, finding the Kubernetes controller manager leader via kubectl get leases -n kube-system and killing the leader will also allow the rollout to be successful. The rollout will also succeed if the old replicaset is scaled down to zero replicas.

Kubernetes version

Failure occurs on Kubernetes versions 1.20, 1.21, 1.22, 1.23 and 1.24.

Cloud provider

N/A

OS version

N/A

Install tools

IBM Cloud Kubernetes Service

Container runtime (CRI) and and version (if applicable)

N/A

Related plugins (CNI, CSI, ...) and versions (if applicable)

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/appsCategorizes an issue or PR as relevant to SIG Apps.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions