Skip to content

[FG:InPlacePodVerticalScaling] Add debug log, when drop pod update message and only old update is processed #129539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zanzan-wang
Copy link

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Add debug log, help user get the information, when kubelet drop pod update message.

Which issue(s) this PR fixes:

Fixes #129518

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 9, 2025
Copy link

linux-foundation-easycla bot commented Jan 9, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 9, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

Welcome @zanzan-wang!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @zanzan-wang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jan 9, 2025
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zanzan-wang
Once this PR has been reviewed and has the lgtm label, please assign tallclair for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -964,6 +964,7 @@ func (p *podWorkers) UpdatePod(options UpdatePodOptions) {
select {
case podUpdates <- struct{}{}:
default:
klog.V(4).InfoS("Pending update already queued", "podUID", podUID, "updateType", options.UpdateType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this log content? Will this cause kubelet to have too many useless logs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review.
We think that if the pod update message directly discarded without processed, it is necessary to have exception logs. This exception does not occur frequently.
Log level is defined as 4, not a high log level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zanzan-wang

We think that if the pod update message directly discarded without processed

Isn't it discarded when there is another pending update waiting in the channel? If so, what additional information does this message provide?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, when there is another message in process, the pending message will drop.
When this happen, kubelet should give user some clue about this. Kubelet log maybe neccessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you mean "Pending update already queued", this description is not proper?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We meet it when test the InPlacePodVerticalScaling feature, but we think it is common for others.
When the interval between two patch command is relatively small, the adjustment may not successful. This log can help us quickly identify the reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that pod worker would process one update at the same time new update comes and the new one would be dropped? This is disturbing and signals about some bug/race in my opinion.

Let's see what other reviewers say.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, because the channel buffer is 1: podUpdates = make(chan struct{}, 1)
We check some other channel process, the logic are same. If cannot process the message because buffer is full, then drop it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We meet it when test the InPlacePodVerticalScaling feature, but we think it is common for others.

/cc @tallclair @vinaykul

@zanzan-wang can you provide some more info about the problem you met with In Place update? What's the impact?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some times our App will send several Pod resize requests in the short time.
If the previous update message not finish, the new pod resize request will be dropped without any warning.
We can not easliy identified that it is dropped or have other process error.
This log could help us to debug well.

@AmarNathChary
Copy link
Contributor

/easy cla

@AmarNathChary
Copy link
Contributor

Hey @zanzan-wang please sign the CLA before raising a PR

@zanzan-wang
Copy link
Author

/easy cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 13, 2025
@@ -964,6 +964,7 @@ func (p *podWorkers) UpdatePod(options UpdatePodOptions) {
select {
case podUpdates <- struct{}{}:
default:
klog.V(4).InfoS("Pending update already queued", "podUID", uid, "updateType", options.UpdateType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.V(4).InfoS("Pending update already queued", "podUID", uid, "updateType", options.UpdateType)
klog.V(4).InfoS("Pending update already queued", "pod", klog.KRef(ns, name), "podUID", uid, "updateType", options.UpdateType)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggest, I will update it later

@pacoxu
Copy link
Member

pacoxu commented Jan 13, 2025

It sounds valid.
/cc @SergeyKanzhelev @bart0sh

@k8s-ci-robot k8s-ci-robot added the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Jan 14, 2025
@zanzan-wang
Copy link
Author

Can you help review this, thanks!
/cc @wzshiming @dchen1107 @SergeyKanzhelev @bart0sh

@bart0sh
Copy link
Contributor

bart0sh commented Jan 15, 2025

@zanzan-wang Please, squash commits into one.

/release-note none

@bart0sh
Copy link
Contributor

bart0sh commented Jan 15, 2025

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 15, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/contains-merge-commits Indicates a PR which contains merge commits. label Jan 15, 2025
Add debug log, when drop pod update message

update according to suggestion
@bart0sh
Copy link
Contributor

bart0sh commented Jan 15, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 15, 2025
@zanzan-wang
Copy link
Author

/cc @wzshiming @dchen1107 @SergeyKanzhelev

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 17, 2025
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 17, 2025
@shiya0705
Copy link

shiya0705 commented May 29, 2025

/cc @dchen1107 @wzshiming @bart0sh @SergeyKanzhelev
It seems valid, how about give lgtm?

@pacoxu
Copy link
Member

pacoxu commented May 29, 2025

/remove-lifecycle rotten
/test pull-kubernetes-cmd

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 29, 2025
@SergeyKanzhelev
Copy link
Member

@tallclair based on description it seems that this is a bigger issue, not simply the lack of a log message:

Some times our App will send several Pod resize requests in the short time.
If the previous update message not finish, the new pod resize request will be dropped without any warning.
We can not easliy identified that it is dropped or have other process error.
This log could help us to debug well.

/retitle [FG:InPlacePodVerticalScaling] Add debug log, when drop pod update message and only old update is processed

@k8s-ci-robot k8s-ci-robot changed the title Add debug log, when drop pod update message [FG:InPlacePodVerticalScaling] Add debug log, when drop pod update message and only old update is processed Jun 4, 2025
@shiya0705
Copy link

@tallclair based on description it seems that this is a bigger issue, not simply the lack of a log message:

Some times our App will send several Pod resize requests in the short time.
If the previous update message not finish, the new pod resize request will be dropped without any warning.
We can not easliy identified that it is dropped or have other process error.
This log could help us to debug well.

/retitle [FG:InPlacePodVerticalScaling] Add debug log, when drop pod update message and only old update is processed

@tallclair based on description it seems that this is a bigger issue, not simply the lack of a log message:

Some times our App will send several Pod resize requests in the short time.
If the previous update message not finish, the new pod resize request will be dropped without any warning.
We can not easliy identified that it is dropped or have other process error.
This log could help us to debug well.

/retitle [FG:InPlacePodVerticalScaling] Add debug log, when drop pod update message and only old update is processed

As current code application, pod resize will be finally applied according to the last pod resize configuration(even though the requests in the middle that couldn't be processed in time were dropped), it seems reasonable.
However, we can add a log that explicitly tells the user that not all request are responded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

Report event or record error/info log when drop podUpdates message
9 participants