[Controllers Health Check] Implement controllers custom health check function with informer readiness

### What would you like to be added?

Implement `HealthCheckable` interface for controllers to include the informer cache readiness. So that each controller has real health check on whether it is running and ready to process requests.

### Why is this needed?

Currently we use default ping health checker for controllers without custom health check functions defined (all controllers don't have it implemented now). As a result, it always returns healthy regardless of its running state. 
https://github.com/kubernetes/kubernetes/blob/d96cfeb7de7742733c1c83a5cf0f3675b2910d6e/cmd/kube-controller-manager/app/controllermanager.go#L762-L778

Although controllers will wait for informer cache to be synced before starting processing any requests, we are lacking visibility into whether the controller is actually healthy/ready for processing requests in the work queue.

For example for job-controller it waits cache to be synced in `jm.Run` function
https://github.com/kubernetes/kubernetes/blob/bbb42911531af4cd9206e7646c0af3ace157404e/pkg/controller/job/job_controller.go#L248-L258

If the cache fails to be synced the job-controller routine would exit here
https://github.com/kubernetes/kubernetes/blob/bbb42911531af4cd9206e7646c0af3ace157404e/cmd/kube-controller-manager/app/batch.go#L50

But the registered ping check will still return healthy. 

The registered health checks are also exposed as `kubernetes_healthcheck` metrics emitted by KCM through `/metrics/slis`. Having custom health check for controllers will increase the user visibility into controller's runtime healthiness.

There was a similar issue created before but didn't get attention https://github.com/kubernetes/kubernetes/issues/128233, reopening new one with focus on improving the health check functions for each controller. The informer cache sync can be the first one (and generic one) included in all custom health check functions

	check := controllerhealthz.NamedPingChecker(controllerName)
	if ctrl != nil {
	// check if the controller supports and requests a debugHandler
	// and it needs the unsecuredMux to mount the handler onto.
	if debuggable, ok := ctrl.(controller.Debuggable); ok && unsecuredMux != nil {
	if debugHandler := debuggable.DebuggingHandler(); debugHandler != nil {
	basePath := "/debug/controllers/" + controllerName
	unsecuredMux.UnlistedHandle(basePath, http.StripPrefix(basePath, debugHandler))
	unsecuredMux.UnlistedHandlePrefix(basePath+"/", http.StripPrefix(basePath, debugHandler))
	}
	}
	if healthCheckable, ok := ctrl.(controller.HealthCheckable); ok {
	if realCheck := healthCheckable.HealthChecker(); realCheck != nil {
	check = controllerhealthz.NamedHealthChecker(controllerName, realCheck)
	}
	}
	}

	logger.Info("Starting job controller")
	defer logger.Info("Shutting down job controller")

	if !cache.WaitForNamedCacheSync("job", ctx.Done(), jm.podStoreSynced, jm.jobStoreSynced) {
	return
	}

	for i := 0; i < workers; i++ {
	go wait.UntilWithContext(ctx, jm.worker, time.Second)
	go wait.UntilWithContext(ctx, jm.orphanWorker, time.Second)
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Controllers Health Check] Implement controllers custom health check function with informer readiness #132137

What would you like to be added?

Why is this needed?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Controllers Health Check] Implement controllers custom health check function with informer readiness #132137

Description

What would you like to be added?

Why is this needed?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions