Skip to content

Option for acknowledging terminating Pods in Deployment rolling update #107920

Open
@atiratree

Description

@atiratree

What would you like to be added?

It could make sense to wait for a pod to be terminated before scheduling a new one for the issues mentioned bellow. Even though some issues can be partially mitigated by proper setup of maxUnavailable and maxSurge, it is not applicable for all of them.

I would like to propose a new opt-in behaviour that would solve this. Deployment controller would include Terminating pods in the computation of current running replicas when deciding if the new RS should scale up (or old in case of proportional scaling).

This could be configured for example in .spec.strategy.rollingUpdate.scalingPolicy with possible values

  1. IgnoreTerminatingPods - default and current behaviour
  2. WaitForTerminatingPods

The disadvantage of this feature is a slower rollout in resource unrestricted environments. So, using this feature would be advised only for similar use cases to the mentioned ones above.

Why is this needed?

In some cases people are surprised that their deployment can momentarily have more pods during a rollout than described ( replicas - maxUnavailable < availableReplicas < replicas + maxSurge). The culprit are Terminating pods that can run in addition to the Running + Starting pods.
Even though Terminating pods are not considered part of a deployment this can cause problems with resource usage and scheduling:

  1. Unnecessary autoscaling of nodes in tight environments and driving up cloud costs. This can hurt especially if

    relevant issues:

  2. A problem also arises in contentious environments where pods are fighting for resources. This can bring up exponential backoff for not yet started pods into big numbers and unnecessarily delay start of such pods until they pop from the queue when there are computing resources to run them. This can slow down the deployment considerably.

    relevant issue: During a rolling update, replica start gets caught in exponential backoff, causing unnecessary delay of up to 16 minutes. #98656

    In this issue the resources were limited by a quota, but this can be due to other reasons as well. In our use case we noticed, this can occur also in high availability scenarios where pods are expected to run only on certain nodes and pod anti-affinity forbids to run two pods at the same node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/appsCategorizes an issue or PR as relevant to SIG Apps.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions