Skip to content

kubelet_pod_start_sli_duration_seconds appears not to match its specification, at least as far as excluding init container runtime #131733

@jim-minter

Description

@jim-minter

What happened?

According to

// PodStartSLIDuration is a Histogram that tracks the duration (in seconds) it takes for a single pod to run,
// excluding the time for image pulling. This metric should reflect the "Pod startup latency SLI" definition
// ref: https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md
//
// The histogram bucket boundaries for pod startup latency metrics, measured in seconds. These are hand-picked
// so as to be roughly exponential but still round numbers in everyday units. This is to minimise the number
// of buckets while allowing accurate measurement of thresholds which might be used in SLOs
// e.g. x% of pods start up within 30 seconds, or 15 minutes, etc.
PodStartSLIDuration = metrics.NewHistogramVec(
&metrics.HistogramOpts{
Subsystem: KubeletSubsystem,
Name: PodStartSLIDurationKey,
Help: "Duration in seconds to start a pod, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch",
Buckets: podStartupDurationBuckets,
StabilityLevel: metrics.ALPHA,
},
[]string{},
)
and https://github.com/kubernetes/community/blob/master/sig-scalability/slos/pod_startup_latency.md, I expect the kubelet_pod_start_sli_duration_seconds metric to exclude init container runtime from its measurements.

I tested this by creating lots of pods with a 10 second sleep in their init containers, then looked at the metric on the relevant kubelet, and I see this kind of thing:

kubelet_pod_start_sli_duration_seconds_bucket{le="0.5"} 0
kubelet_pod_start_sli_duration_seconds_bucket{le="1"} 0
kubelet_pod_start_sli_duration_seconds_bucket{le="2"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="3"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="4"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="5"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="6"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="8"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="10"} 1
kubelet_pod_start_sli_duration_seconds_bucket{le="20"} 43

What did you expect to happen?

I would expect to see the le="10", le="8", le="6", etc. counters increment in this case, since init container time should not be being included.

How can we reproduce it (as minimally and precisely as possible)?

I used a deployment yaml like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 50
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      nodeName: $YOUR_NODE_NAME
      initContainers:
      - name: init
        image: busybox
        command: ["sh", "-c", "sleep 10"]
      containers:
      - name: test
        image: k8s.gcr.io/pause:3.9

Then SSH onto $YOUR_NODE_NAME and with an appropriate bearer token, and do:

curl -sk -H "Authorization: Bearer $BEARER_TOKEN" https://localhost:10250/metrics | grep kubelet_pod_start_sli_duration

(https://yuki-nakamura.com/2023/10/15/get-kubelets-metrics-manually/ is a helpful resource for how to create a ServiceAccount + ClusterRoleBinding to get a bearer token to use against Kubelet metrics. It's a little out of date though, you'll also need to follow something like https://stackoverflow.com/questions/73164466/how-to-create-a-secret-for-service-account-using-kubernetes-version-1-24 to create a Secret to lift the bearer token).

Anything else we need to know?

By code inspection, I cannot find anywhere any indication of code that deducts init container time.

I believe the key part of the code is here:

if hasPodStartedSLO(pod) {
podStartingDuration := when.Sub(pod.CreationTimestamp.Time)
imagePullingDuration := state.lastFinishedPulling.Sub(state.firstStartedPulling)
podStartSLOduration := (podStartingDuration - imagePullingDuration).Seconds()
klog.InfoS("Observed pod startup duration",
"pod", klog.KObj(pod),
"podStartSLOduration", podStartSLOduration,
"podStartE2EDuration", podStartingDuration,
"podCreationTimestamp", pod.CreationTimestamp.Time,
"firstStartedPulling", state.firstStartedPulling,
"lastFinishedPulling", state.lastFinishedPulling,
"observedRunningTime", state.observedRunningTime,
"watchObservedRunningTime", when)
metrics.PodStartSLIDuration.WithLabelValues().Observe(podStartSLOduration)
metrics.PodStartTotalDuration.WithLabelValues().Observe(podStartingDuration.Seconds())
state.metricRecorded = true
// if is the first Pod with network track the start values
// these metrics will help to identify problems with the CNI plugin
if !pod.Spec.HostNetwork && !p.firstNetworkPodSeen {
metrics.FirstNetworkPodStartSLIDuration.Set(podStartSLOduration)
p.firstNetworkPodSeen = true
}
}

I read this as calculating t(all containers started) - t(pod created) - (t(last pull finished) - t(first pull started))

I'm also suspicious because I didn't see anything that excludes pods with stateful volumes, although I might have just missed it.

Similarly, I suspect that this metric implementation is currently including time between pod creation and successful scheduling; if this is so, I'm dubious that it well matches the intention of the SLO documentation "schedulable" term.

Kubernetes version

$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.7

Cloud provider

AKS

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
$ uname -a
Linux aks-nodepool1-58959923-vmss000000 5.15.0-1087-azure #96-Ubuntu SMP Fri Mar 28 20:31:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Install tools

az group create -n test -l canadacentral az aks create -g test -n test az aks get-credentials -g test -n test

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/backlogHigher priority than priority/awaiting-more-evidence.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Triaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions