Horizontal Pod Autoscaler reporting FailedRescale error

### What happened?

We are currently running several applications that are configured with metric-based scaling. These applications are expected to scale frequently in response to varying traffic patterns.

Occasionally, we observe the following error during scaling events (though it is not frequent), and there is no noticeable impact on the actual scaling process:

```
reason: FailedRescale, New size: <size>; reason: All metrics <below>/<above> target; error: Operation cannot be fulfilled on deployments.apps: the object has been modified; please apply your changes to the latest version and try again.
```

Our understanding is that this occurs due to a race condition where another controller or a CI/CD process is simultaneously patching or updating the Deployment object at the same time the Horizontal Pod Autoscaler (HPA) attempts to scale it.

**In some cases, we observed the issue occurring even when there were no apparent changes to the Deployment object itself. Only a possible update to the resourceVersion, which we were unable to confirm the reason for the same**


To be cautious, we have set up an alert to notify us if the HPA fails to scale a deployment for any reason. However, encountering this specific error results in unnecessary noise and false positives in our alerting system.

### What did you expect to happen?

We expect the Horizontal Pod Autoscaler (HPA) to perform retries internally and suppress the reporting of this error, as its occurrence generates unnecessary noise and may cause undue concern regarding system scaling.

### How can we reproduce it (as minimally and precisely as possible)?

To reproduce the issue:

- Configure a Deployment and Horizontal Pod Autoscaler (HPA) with CPU or memory-based scaling.
- Apply load to the Deployment pods to trigger scaling by the HPA.
- Simultaneously, perform multiple consecutive patch operations on the Deployment.

This sequence will result in the HPA reporting the same error in its events.

### Anything else we need to know?

NA

### Kubernetes version

<details>

```console
$ kubectl version
Client Version: v1.30.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.31.7-eks-bcf3d70
```

</details>


### Cloud provider

<details>
AWS - EKS
</details>


### OS version

<details>

```console
# On MAC:

OS: 15.0.1

$ uname -a
Darwin NOVI-QH6F676G2Y 24.0.0 Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000 arm64


</details>


### Install tools

NA

### Container runtime (CRI) and version (if applicable)

NA

### Related plugins (CNI, CSI, ...) and versions (if applicable)

NA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Horizontal Pod Autoscaler reporting FailedRescale error #132002

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Horizontal Pod Autoscaler reporting FailedRescale error #132002

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions