Description
What would you like to be added?
Implement HealthCheckable
interface for controllers to include the informer cache readiness. So that each controller has real health check on whether it is running and ready to process requests.
Why is this needed?
Currently we use default ping health checker for controllers without custom health check functions defined (all controllers don't have it implemented now). As a result, it always returns healthy regardless of its running state.
kubernetes/cmd/kube-controller-manager/app/controllermanager.go
Lines 762 to 778 in d96cfeb
Although controllers will wait for informer cache to be synced before starting processing any requests, we are lacking visibility into whether the controller is ready.
For example for job-controller it waits cache to be synced in jm.Run
function
kubernetes/pkg/controller/job/job_controller.go
Lines 248 to 258 in bbb4291
If the cache fails to be synced the job-controller routine would exit here
But the registered ping check will still return healthy. The registered health checks are also exposed as kubernetes_healthcheck
metrics emitted by KCM through /metrics/slis
. Having custom health check for controllers will increase the user visibility into controller's runtime healthiness.
There was a similar issue created before but didn't get attention #128233, reopening new one with focus on improving the health check functions for each controller. The informer cache sync can be the first one (and generic one) included in all custom health check functions
Metadata
Metadata
Assignees
Labels
Type
Projects
Status