-
Notifications
You must be signed in to change notification settings - Fork 40.7k
The second time the pod is deleted the grace period does not take effect #113883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @Seaiii. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@matthyx Hi, I've re-posted the new PR here, this one I think is better for the latest version |
@bobbypage @endocrimes 👋🏻👋🏻 Hi ,this problem is currently plaguing our team again. At the same time I saw that the k8s team was changing this issue a week ago, I made some changes on this basis, Fixes #113408 Please review the official |
/ok-to-test |
/cc |
ok, thank you for your comments. I'll take a look at it, but it should be fine for now, since the context is coming from #113591 |
Please add the header from |
Oh, sorry, I can't believe I forgot about this. Thanks for the reminder that I didn't get my computer back at work. I'll add it first thing tomorrow. |
at least the first one |
@smarterclayton @dashpole 👋🏻👋🏻 Hi ,I saw that you guys were working on this a week ago, is this PR of mine related to the problem you guys are working on? I don't know if I can help you guys |
@Seaiii it now looks much better... the issue is ready for triage now. |
ok,I will do it now, But main-v2 still FAILED, this does not affect it? |
@Seaiii: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/retest |
@mikebrow @samuelkarp does containerd handle multiple stop requests coming in? we had to implement custom logic to handle this case in cri-o |
Yes, both containerd and docker can handle multiple requests, I've tested it in local integration. As long as the kubelet can send two grpc's it solves the problem |
case <-ctx.Done(): | ||
klog.V(2).InfoS("Context expired while executing PreStop hook", "pod", klog.KObj(pod), "podUID", pod.UID, | ||
"containerName", containerSpec.Name, "containerID", containerID.String(), "gracePeriod", gracePeriod) | ||
return int64(metav1.Now().Sub(start.Time).Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look to me like we need this? we'll fall through to the return directly below, which looks identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look to me like we need this? we'll fall through to the return directly below, which looks identical
The logic here is the same as shortening the grace period.
The two logics
- if it is done is closed, it means the grpc call ends normal execution.
- If ctx.done(), it means gprc is blocking and needs to shorten the grace period. The current blocking is canceled by ctx.done. ctx is passed from kubelet.go in the previous layer pod_workers.go.
1.done is the chan that I registered
2.ctx.Done() is the function, which is passed by pod_worker.go
These two are not the same process and logic.The principle is that pod_worker.go registers the context's cancel function. By selecting ctx.Done to receive the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean specifically the return int64(metav1.Now().Sub(start.Time).Seconds())
seems duplicated with line 681 which directly proceeds this it seems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean specifically the
return int64(metav1.Now().Sub(start.Time).Seconds())
seems duplicated with line 681 which directly proceeds this it seems
Oh, yeah, you're right. I've made the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant return
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@SergeyKanzhelev: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
oh, I didn't mean to reopen a PR. I thought this is an issue /close @Seaiii feel free to reopen and bring to SIG Node attention |
@SergeyKanzhelev: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@Seaiii: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@Seaiii: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
/sig node
-->
What this PR does / why we need it:
Which issue(s) this PR fixes:
It doesn't work when deleting the same pod for the second time. And force stop doesn't work either, it only lets apiserver delete pods, but kubelet doesn't delete containers.
The prerequisite is that the container does not stop down! (e.g. if the container is in sleep 9999 state).
When I first kubectl delete pod --grace-perios=30, then when kubectl delete pod --grace-perios=0 -force. the second deletion (including forced deletion) does not have any effect on the kubelet.
According to the latest documentation, gracePeriod, when set to grace period for the second deletion, triggers the kubelet to start cleaning up immediately. But looking at the source code, the first delete needs to wait for the CRI to return the result, because it needs to interact with the CRI, so it needs to wait for the first gracePeriod to process successfully before processing this grace period. This is inconsistent with the documentation logic and the code logic.
Which issue(s) this PR fixes:
Fixes #113712
Fixes #113717
Fixes #83916
Fixes #87039
Special notes for your reviewer:
1.Cannot shorten the grace period for the second deletion
2.This is an issue that has arisen since the kubelet was reworked after version 1.22.
1.Kubelet delete pod name —garce-period=100
2.Kubelet delete pod name —garce-period=10
The second time when the deletion time is reduced to 10s, it should
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
none