-
Notifications
You must be signed in to change notification settings - Fork 40.7k
[FG:InPlacePodVerticalScaling] Add Pod resize complete event #130387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @shiya0705. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
pkg/kubelet/kubelet.go
Outdated
@@ -2872,6 +2872,40 @@ func (kl *Kubelet) canResizePod(pod *v1.Pod) (bool, v1.PodResizeStatus, string) | |||
return true, v1.PodResizeStatusInProgress, "" | |||
} | |||
|
|||
|
|||
func resizeMsgGenerate(allocatedPod *v1.Pod, podStatus *kubecontainer.PodStatus) string { | |||
var msg string = "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just use "}" ? This might make the return value a bit weird. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed code, var msg string = "}" is compared with msg = fmt.Sprintf("Pod resize complete, {") + msg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this is not a correct way to initialize. Can we use like [] string
to get msg and Join operation at the end instead of this string concatenation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments, I change the string definition.
8159cfe
to
1435a12
Compare
/retest |
a0049a9
to
c13720a
Compare
/retest |
pkg/kubelet/kubelet.go
Outdated
@@ -2967,8 +3013,19 @@ func (kl *Kubelet) handlePodResourcesResize(pod *v1.Pod, podStatus *kubecontaine | |||
// If a resize is in progress, make sure the cache has the correct state in case the Kubelet restarted. | |||
kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", false) | |||
} else { | |||
PodResizeConditionsLenBefore := len(kl.statusManager.GetPodResizeConditions(pod.UID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than tracking this here, let's just make ClearPodResizeInProgressCondition
return whether a condition was cleared. Then the logic just becomes:
if kl.statusManager.ClearPodResizeInProgressCondition(pod.UID) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair do you still want the event emitted when pending resize is reverted? If so, just checking that the pod ResizeInProgress condition is cleared is not enough (since pending resizes are tracked in the other condition). To handle that I think we'd need to move this line to the beginning of the function and unconditionally check if the entire resize status has been cleared at the end of the defer block (which is what I was suggesting in #130387 (comment))
(referencing your comment #127172 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments.
-
I agree with natasha's comment, just checking pod ResizeInProgress condition is cleared is not enough, but this has nothing to do with "event emitted when pending resize is reverted". If only check if kl.statusManager.ClearPodResizeInProgressCondition(pod.UID),after resize completed,resize complete event will be emitted continually every time when SyncPod is called. Because func handlePodResourcesResize defer block will check isPodResizeInProgress even after pod resize is completed, this is not what we want. We just want to emit complete event once resize is finished.
-
I think current code has satisfied this requirement, emit complete event when pending resize is reverted.
Pending resize is reverted, first both InProgress and Pending condition are cleared,then set InProgress condition,after resize complete,clear InProgress condition( emit complete event). The detailed is as below.
- When pending
->SetPodResizePendingCondition - pending reverted
-> ClearPodResizePendingCondition/ClearPodResizeInProgressCondition(func handlePodResourcesResize)
-> SetPodResizeInProgressCondition (func (kl *Kubelet) SyncPod)
-> ClearPodResizeInProgressCondition(pod.UID) (defer func() in func handlePodResourcesResize )
Please correct me if there is any error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tallclair
Could you please give your comment?
pkg/kubelet/kubelet.go
Outdated
@@ -2953,6 +2954,51 @@ func disallowResizeForSwappableContainers(runtime kubecontainer.Runtime, desired | |||
return false, "" | |||
} | |||
|
|||
// PodResourceSummary save container and pod-level resources for resizing | |||
type PodResourceSummary struct { | |||
// TODO: Add pod-level resources here once resizing pod-level resources is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @ndixita
pkg/kubelet/kubelet.go
Outdated
// PodResourceSummary save container and pod-level resources for resizing | ||
type PodResourceSummary struct { | ||
// TODO: Add pod-level resources here once resizing pod-level resources is supported | ||
InitContainers map[string]v1.ResourceRequirements `json:"InitContainers,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on the type for this. The map[string]v1.ResourceRequriements
is a more compact json representation, but it doesn't preserve container order. The alternative would be something like
InitContainers []ContainerResourceSummary
Containers []ContainerResourceSummary
// ...
type ContainerResourceSummary struct {
Name string
Resources v1.ResourceRequirements
}
I'm leaning toward the latter, since it would basically present as a filtered view of the pod spec, rather than a completely new type, if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your consideration, applied it to the new code
pkg/kubelet/kubelet.go
Outdated
|
||
containerStatus := podStatus.FindContainerStatusByName(allocatedContainer.Name) | ||
if containerStatus == nil || containerStatus.State != kubecontainer.ContainerStateRunning { | ||
// If the container isn't running, it doesn't need to be saved to completed event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this. The resize might not have been "actuated", but a waiting container can still be resized - it will use the newly allocated values when it starts up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I delete this condition judgment in the new code, so that even the containerStatus is nil, but container resource in the completed message is displayed.
pkg/kubelet/kubelet.go
Outdated
// If pod resize completed, print pod resize complete event | ||
if (PodResizeConditionsLenBefore != 0) && (len(kl.statusManager.GetPodResizeConditions(pod.UID)) == 0) { | ||
// PodResizecompleteAnnotations including pod all resizeable container resrouce, it is printed in event annotaions | ||
PodResizecompleteAnnotations := kl.getPodResizeInfo(allocatedPod, podStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why put this in annotations rather than the event message? If it's part of the message, we don't need to worry about the timestamp.
Alternatively, if it's too much information to dump in the message, maybe we could put the allocated generation in the message instead of the timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it to annotations to refer this comment, but annotations do not update if event message content is same, so added a time stamp to distinguish different times of resize completed event.
#127172 (comment)
But add a timestamp in the event message is indeed strange.
In the new code, I deleted timestamp and moved detailed information from annotations to the message.
pkg/kubelet/kubelet.go
Outdated
} | ||
|
||
// getPodResizeInfo to cache containers and pod resources CPU and memory resources once resizing is completed | ||
func (kl *Kubelet) getPodResizeInfo(allocatedPod *v1.Pod, podStatus *kubecontainer.PodStatus) map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this. I think it should have its own dedicated test.
ea9c2c0
to
64d90ff
Compare
pkg/kubelet/kubelet.go
Outdated
@@ -2967,8 +3015,18 @@ func (kl *Kubelet) handlePodResourcesResize(pod *v1.Pod, podStatus *kubecontaine | |||
// If a resize is in progress, make sure the cache has the correct state in case the Kubelet restarted. | |||
kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", false) | |||
} else { | |||
// podResizeConditionsOriginalLen is used to save original resize condition | |||
podResizeConditionsOriginalLen := len(kl.statusManager.GetPodResizeConditions(pod.UID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding this here just so it doesn't get lost but make sure you get an answer from @tallclair on #130387 (comment) on the desired behavior
pkg/kubelet/kubelet.go
Outdated
@@ -2967,8 +3015,18 @@ func (kl *Kubelet) handlePodResourcesResize(pod *v1.Pod, podStatus *kubecontaine | |||
// If a resize is in progress, make sure the cache has the correct state in case the Kubelet restarted. | |||
kl.statusManager.SetPodResizeInProgressCondition(pod.UID, "", "", false) | |||
} else { | |||
// podResizeConditionsOriginalLen is used to save original resize condition | |||
podResizeConditionsOriginalLen := len(kl.statusManager.GetPodResizeConditions(pod.UID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are holding off on merging #131157 for the time being, meaning I wouldn't expect the code that backfills the status manager conditions from there to be merged any time soon. If we want to move forward here, can you copy the condition backfilling into this PR? Basically this section
(This is a follow-up to my other comment: #130387 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edfadd9
to
9eb657a
Compare
pkg/kubelet/kubelet.go
Outdated
for _, c := range pod.Status.Conditions { | ||
switch c.Type { | ||
case v1.PodResizePending: | ||
kl.statusManager.SetPodResizePendingCondition(pod.UID, c.Reason, c.Message) | ||
case v1.PodResizeInProgress: | ||
kl.statusManager.SetPodResizeInProgressCondition(pod.UID, c.Reason, c.Message, true) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@natasha41575, for this comment #130387 (comment)
I merged backfills status manager conditions code to here, is my understanding correct?
ac792a4
to
4087930
Compare
4087930
to
5451237
Compare
ae0472a
to
79887c0
Compare
b46eeb7
to
780529e
Compare
@shiya0705: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@shiya0705 I was originally thinking it would be best to get this in before #131612, so that you could avoid the merge conflict. But after reflection, I realized that if we do that, all the work you've doing here would more or less be wasted, because #131612 completely changes the logic for resize allocation and determining if a resize is done. It would also mean that we'd have to deal with the merge conflict in #131612, but #131612 is a stronger priority for 1.34. I believe the refactor in #131612 should also greatly simplify the logic required for emitting the resize event, which I think will make it easier for both you and reviewers. If you are willing to wait for #131612 and rebase on it, I can prioritize review of this one after. |
@natasha41575 I get your idea, it make sense to make this event after resize logic become easier and clearer. I will keep monitoring #131612 and rebase code after it is ready. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add events when pod resize complete
Which issue(s) this PR fixes:
Fixes #127172
Special notes for your reviewer:
N/A
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A
Does this PR introduce a user-facing change?
Event is a user-facing change