-
Notifications
You must be signed in to change notification settings - Fork 40.7k
DRA kubelet: connection monitoring #132058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Replaced the Manager interface with a simple struct. There was no need for the interface. Replaced the global instance of the DRA plugins store with a fresh instanced owned by the manager. This makes unit testing a bit easier (no need to restore state, would enable parallelizing long-running tests). Simplified the plugin.PluginsStore type to just "plugin.Store" because it stuttered. The Plugin type is kept because having one struct named after its package is one common exception from the "don't stutter" guideline. For the sake of clarify, the package gets imported as "draplugin" (= <parent dir>/<package>). Removed unused NewManager "node" parameter and replaced direct construction of cache and manager with calls to NewManager because that is how the manager should get constructed (less code, too). Fixed incorrect description of Manager: the plugin store is the entity which manages drivers, not the manager. The manager is focused on the DRA logic around ResourceClaims.
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
type pluginsStore struct { | ||
sync.RWMutex | ||
// Store keeps track of how to reach plugins registered for DRA drivers. | ||
// Each plugin has a gRPC endpoint. There may be more than one plugin per driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my attempt to rationalize the difference between "kubelet plugin" and "DRA driver".
The DRA manager has traditionally favored "plugin" and "plugin name" when the rest of the system uses the terms "DRA driver" and "DRA driver name". IMHO this is unnecessary and the code would have been more readable when restricting the use of "plugin" to the registration mechanism. Even with some of my updates that separation is not complete and/or consistent.
Would it be worth doing more renames?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets worse: sometimes the plugin instance is also called "client", leading to "NewDRAPluginClient". Apropos, the "New" is also wrong there because it does not create an instance.
I'll add a commit with further renaming...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just code readability: kubelet also logs "pluginName" as an attribute where kube-scheduler and other components use "driverName". This is visible to users.
e37d351
to
6c65e86
Compare
/test pull-kubernetes-node-e2e-containerd-2-0-dra |
6c65e86
to
23b2871
Compare
ginkgo.By("wait for ResourceSlice removal") | ||
gomega.Eventually(ctx, listResources).Should(gomega.BeEmpty(), "ResourceSlices without plugin") | ||
gomega.Consistently(ctx, listResources).WithTimeout(5*time.Second).Should(gomega.BeEmpty(), "ResourceSlices without plugin") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add test cases for reconnection. I would be nice to test 2 scenarios: when reconnection occurs when whiping is not started and when the wiping is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case for "reconnect before wiping" fits here: "ResourceSlices must not be removed if plugin restarts quickly enough"
The one for "reconnection after wiping" doesn't: we allow the slice to be deleted and have the restarted driver recreate it and see it remain, but that doesn't really tell us whether the kubelet has reconnected. I'm putting something next to "must be functional when plugin starts to listen on a service socket after registration": "Resource Kubelet Plugin must be functional after reconnect".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we allow the slice to be deleted and have the restarted driver recreate it and see it remain, but that doesn't really tell us whether the kubelet has reconnected.
I thought about making sure that the slice(s) deleted by the Kubelet before the driver publishes them again on reconnect by:
- stopping the plugin
- waiting for slices to be removed by kubelet
- starting the plugin
- may be waiting for the slices to be updated( not sure if plugin updates them on its start unconditionally)
- testing that it serves requests successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I added, without the "waiting for the slices to be updated" (creating slices on startup is covered elsewhere).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look good to me, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing - here you've mentioned that we need to test use case(s) with the same registration and service sockets. Would it still make sense to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, worth adding. I had it on my backlog to go through that post before merging.
// | ||
// TimedWorkerQueue uses namespace/name as key. We use | ||
// the driver name as name with no namespace. | ||
pendingWipes *timedworkers.TimedWorkerQueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking that the added functionality is out of scope of store operations(storing plugins) and should be implemented by different structures. This approach complicates the code a lot and is harder to maintain and test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of the store has increased. It's no longer limited to just storing of entities created elsewhere. We can rename it to "plugin manager", but we already have too many "managers"...
If we split the responsibilities, we are back at having to using different mutexes and potential time-of-check-time-of-use races. All of that gets avoided by making a single entity responsible for everything related to plugins (tracking registered endpoints, tracking connectedness).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced. I don't see any harm in using mutexes for every structure if the structure and its interface is clear and easy to use. Let's see an example in my pr
where resource slices cleanup is handled by a separate cleanupHandler structure with 2 interface methods: cleanupResourceSlices
and cancelPendingWipe
(should probably be renamed in cancelCleanupResourceSlices
. those are called from registration handler when plugin registered or unregistred and by plugin when connection status changes. The fact that those methods lock cleanupHandler doesn't create any issue as the fact that store is locked when plugins are added or removed to/from it. Those locks are not exposed outside and only used internally by cleanupHandler. Can you give an example when this approach can create time-of-check-time-of-use races
?
In my opinion merging registration and connection monitoring into Store is a good big step towards a God object. It's already responsible for 3 things, what's next we're going to add to it? plugin health management? garbage collection of stalled sockets? updating node resources? The list can be continued easily. Testing of this object would also be much harder than a separate ones in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think equating this to a "God object" is a bit extreme.
From my perspective this (new) abstraction has a clear set of roles that all fall within the scope of a DRAPluginManager
. And these roles all logically make sense for me to be together in a single object:
- It handles the registration / deregistration of DRAPlugins (including wiping their resource slices if necessary), allowing the consumer of this component to not have to worry about these details
- It allows you to get a handle to a specific registered plugin and call methods on it (e.g. NodePrepare/UnprepareResources) when appropriate.
I see no reason to force these two roles into separate abstractions unless we plan to reuse one or the other of them to interface with yet another component in some way. Do you have an additional component in mind that would make keeping these roles in separate abstractions the obvious better choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't equate it to "Got object". I said it's a big step in that direction, which I still believe it is. Merging registration and connection management functionality into a storage object doesn't look like a good thing to do. Even if a result seemingly makes sense as you've mentioned above, it doesn't mean that descomposing registration, connection management or cleanup into a separate entities wouldn't make sense. In my opinion it would be easier to understand, maintain and test.
@@ -22,6 +22,58 @@ should end with a DNS domain that is unique for the plugin. Each time a plugin | |||
starts, it has to delete old sockets if they exist and listen anew under the | |||
same filename. | |||
|
|||
## Monitoring Plugin Connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is fully implemented in the dra code, would it make sense to move the documentation there? Monitoring service connection is out of scope of pluginwatcher
, I believe.
Moving it out of pluginmanager
code would also simplify the approval of this PR as it can be approved only by DRA approvers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need SIG Node approval for the DRA code, moving the documentation won't simplify approval. Besides, @klueska can approve also this here.
I prefer to keep documentation about kubelet plugin implementation aspects in one place. Perhaps other types will also implement this.
347795b
to
f964121
Compare
// | ||
// It returns an informative error message including the driver name | ||
// with an explanation why the driver is not usable. | ||
func (pm *PluginManager) GetDRAPlugin(driverName string) (*Plugin, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to rename this type to DRAPluginManager
, the returned type to DRAPlugin
and then have this function just be called GetPlugin()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I'll update the renaming commit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, force-pushed to both PRs.
f964121
to
6f220fa
Compare
The rest of the system logs information using "driverName" as key in structured logging. The kubelet should do the same. This also gets clarified in the code, together with using consistent a consistent name for a Plugin pointer: "plugin" instead of "client" or "instance". The New in NewDRAPluginClient made no sense because it's not constructing anything, and it returns a plugin, not a client -> GetDRAPlugin.
This moves wiping into the plugins store. The advantages are: - Only a single mutex is needed in the plugin store. - The code which decides about queuing and canceling wiping has access to all relevant information while that information is protected against concurrent modifications. - It prepares for making that code more complex ("connection monitoring"). In retrospect it is not clear whether the RegistrationHandler did the right thing in all cases when endpoints got registered and removed concurrently. The RegistrationHandler became a thin, stateless layer on top of the store. Because it's not really needed anymore, the required methods for the plugin manager now get provided directly by the store. The disadvantage is the slightly more complex initialization, but that was also a problem before which just hadn't been solved: wiping ran without the context of the manager. Now it does.
After merging with RegistrationHandler, the store abstraction is more than a dump map. It implements additional logic, so renaming it seems warranted. To avoid confusion with other "plugin managers", the DRA prefix is used. The plugin type gets updated accordingly. This is done in a separate commit for ease of review. The "store" field is kept because that really is just a dump lookup structure.
Instead of creating the gRPC connection on demand and forcing gRPC to connect, we establish it immediately and rely on gRPC to handle the underlying connection automatically like it usually does. It's not clear what benefit the one second connection timeout had. The way it is now, gRPC calls still fail when the underlying connection cannot be established. Having to have a separate context for establishing that connection just made the code more complex. The DRAPluginManager is the central component which manages plugins. Making it responsible for creating them reduces the number of places where a DRAPlugin struct needs to be initialized. Doing this in the DRAPluginManager instead of a stand-alone function simplifies the implementation of connection monitoring, because that will be something that is tied to the DRAPluginManager state.
This ensures that ResourceSlices get removed also when a plugin becomes unresponsive without removing the registration socket. Tests are from kubernetes#131073 by Ed with some modifications, the implementation is new.
Conceptually TimedWorkersQueue is similar to the current code: it spawns goroutines and cancels them. Using it makes the code a bit shorter, even though the TimedWorkersQueue API could be a bit nicer and more consistent (key string vs. WorkerArgs as parameters). Depending on the tainteviction package is a bit odd, which is the reason why TimedWorkersQueue wasn't already used earlier. But there don't seem to be other implementations of this common problem. https://pkg.go.dev/k8s.io/client-go/util/workqueue#TypedDelayingInterface doesn't work because queue entries cannot be removed. This doesn't really solve the problem of tracking goroutines for wiping because TimedWorkersQueue doesn't support that. But not tracking is arguably better than doing it wrong and this only affects unit tests, so it should be okay.
6f220fa
to
aee4073
Compare
/test pull-kubernetes-e2e-kind-dra-canary pull-kubernetes-e2e-kind-dra-n-1-canary |
@pohly: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-canary |
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-n-2-canary pull-kubernetes-kind-dra-canary |
1 similar comment
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-n-2-canary pull-kubernetes-kind-dra-canary |
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-n-2-canary |
2 similar comments
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-n-2-canary |
/test pull-kubernetes-kind-dra-n-1-canary pull-kubernetes-kind-dra-n-2-canary |
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
When a kubelet plugin shuts down without removing its registration socket (for example, because it was forcibly killed or crashed), then the kubelet does not clean up and delete ResourceSlices even if the plugin does not come back. With connection monitoring of the service socket, the kubelet recognizes when a plugin becomes unusable and then cleans up accordingly.
Which issue(s) this PR fixes:
Fixes #128696
Replaces #131073
Depends on #132096
Special notes for your reviewer:
Please review commit-by-commit. The initial commits refactor code to make the last commits simpler, so at least #132096 should get merged first.
Does this PR introduce a user-facing change?