Skip to content

DRA: failed to create pod: "must specify one of: resourceClaimName, resourceClaimTemplateName" #131877

Open
@cyclinder

Description

@cyclinder

What happened?

I created a daemonset with resourceClaims configured, but the pod can't be created, the KCM always complains "must specify one of: resourceClaimName, resourceClaimTemplateName".

I0521 07:08:36.544760       1 event.go:389] "Event occurred" logger="daemonset-controller" object="cyclinder/demo-dra" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Warning" reason="FailedCreate" message="Error creating: Pod \"demo-dra-tz69j\" is invalid: spec.resourceClaims[0]: Invalid value: core.PodResourceClaim{Name:\"demo\", ResourceClaimName:(*string)(nil), ResourceClaimTemplateName:(*string)(nil)}: must specify one of: `resourceClaimName`, `resourceClaimTemplateName`"
E0521 07:08:36.546024       1 daemon_controller.go:329] "Unhandled Error" err="cyclinder/demo-dra failed with : Pod \"demo-dra-tz69j\" is invalid: spec.resourceClaims[0]: Invalid value: core.PodResourceClaim{Name:\"demo\", ResourceClaimName:(*string)(nil), ResourceClaimTemplateName:(*string)(nil)}: must specify one of: `resourceClaimName`, `resourceClaimTemplateName`" logger="UnhandledError"

I used kubectl apply -f deploy.yaml -v 9 and see the resourceClaims have been submitted. what happened here?

...
I0521 15:22:51.476955 3327853 round_trippers.go:473] curl -v -XPOST  -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.32.3 (linux/amd64) kubernetes/32cc146" 'https://10.20.1.50:6443/apis/apps/v1/namespaces/cyclinder/daemonsets?fieldManager=kubectl-client-side-apply&fieldValidation=Strict'
I0521 15:22:51.484979 3327853 round_trippers.go:560] POST https://10.20.1.50:6443/apis/apps/v1/namespaces/cyclinder/daemonsets?fieldManager=kubectl-client-side-apply&fieldValidation=Strict 201 Created in 7 milliseconds
I0521 15:22:51.485039 3327853 round_trippers.go:577] HTTP Statistics: GetConnection 0 ms ServerProcessing 7 ms Duration 7 ms
I0521 15:22:51.485055 3327853 round_trippers.go:584] Response Headers:
I0521 15:22:51.485075 3327853 round_trippers.go:587]     Audit-Id: 1b97801f-f466-46cb-877f-188e1b800d6e
I0521 15:22:51.485088 3327853 round_trippers.go:587]     Cache-Control: no-cache, private
I0521 15:22:51.485100 3327853 round_trippers.go:587]     Content-Type: application/json
I0521 15:22:51.485113 3327853 round_trippers.go:587]     X-Kubernetes-Pf-Flowschema-Uid: c170b105-c632-487e-800c-2b7e48b70984
I0521 15:22:51.485121 3327853 round_trippers.go:587]     X-Kubernetes-Pf-Prioritylevel-Uid: b1672c8d-17a9-49ac-858a-633e8682d346
I0521 15:22:51.485131 3327853 round_trippers.go:587]     Date: Wed, 21 May 2025 07:22:51 GMT
I0521 15:22:51.485337 3327853 helper.go:246] "Response Body" body=<
	{"kind":"DaemonSet","apiVersion":"apps/v1","metadata":{"name":"demo-dra","namespace":"cyclinder","uid":"337f0574-05cc-4004-987f-6706e4bf75a0","resourceVersion":"207473166","generation":1,"creationTimestamp":"2025-05-21T07:22:51Z","annotations":{"deprecated.daemonset.template.generation":"1","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps/v1\",\"kind\":\"DaemonSet\",\"metadata\":{\"annotations\":{},\"name\":\"demo-dra\",\"namespace\":\"cyclinder\"},\"spec\":{\"revisionHistoryLimit\":10,\"selector\":{\"matchLabels\":{\"app\":\"rdma-test-gpu-tool\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"rdma-test-gpu-tool\"}},\"spec\":{\"containers\":[{\"env\":[{\"name\":\"ENV_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.name\"}}},{\"name\":\"ENV_LOCAL_NODE_IP\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"status.hostIP\"}}},{\"name\":\"ENV_LOCAL_NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"spec.nodeName\"}}},{\"name\":\"ENV_SERVICE_NAME\",\"value\":\"rdma-test-gpu-tool\"},{\"name\":\"ENV_POD_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.namespace\"}}}],\"image\":\"ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"rdma-test\",\"ports\":[{\"containerPort\":22,\"name\":\"ssh\",\"protocol\":\"TCP\"}],\"readinessProbe\":{\"exec\":{\"command\":[\"sh\",\"-c\",\"ls /tmp/ready\"]},\"failureThreshold\":3,\"periodSeconds\":10,\"successThreshold\":1,\"timeoutSeconds\":1},\"resources\":{\"limits\":{\"nvidia.com/gpu\":1}}}],\"nodeSelector\":{\"kubernetes.io/os\":\"linux\"},\"resourceClaims\":[{\"name\":\"demo-nri\",\"resourceClaimTemplateName\":\"demo-nri\"}]}}}}\n"},"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"apps/v1","time":"2025-05-21T07:22:51Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:deprecated.daemonset.template.generation":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{"f:revisionHistoryLimit":{},"f:selector":{},"f:template":{"f:metadata":{"f:labels":{".":{},"f:app":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"rdma-test\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"ENV_LOCAL_NODE_IP\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_LOCAL_NODE_NAME\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_POD_NAME\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_POD_NAMESPACE\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_SERVICE_NAME\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:ports":{".":{},"k:{\"containerPort\":22,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}}},"f:readinessProbe":{".":{},"f:exec":{".":{},"f:command":{}},"f:failureThreshold":{},"f:periodSeconds":{},"f:successThreshold":{},"f:timeoutSeconds":{}},"f:resources":{".":{},"f:limits":{".":{},"f:nvidia.com/gpu":{}}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:nodeSelector":{},"f:resourceClaims":{".":{},"k:{\"name\":\"demo-nri\"}":{".":{},"f:name":{},"f:resourceClaimTemplateName":{}}},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}},"f:updateStrategy":{"f:rollingUpdate":{".":{},"f:maxSurge":{},"f:maxUnavailable":{}},"f:type":{}}}}}]},"spec":{"selector":{"matchLabels":{"app":"rdma-test-gpu-tool"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"rdma-test-gpu-tool"}},"spec":{"containers":[{"name":"rdma-test","image":"ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8","ports":[{"name":"ssh","containerPort":22,"protocol":"TCP"}],"env":[{"name":"ENV_POD_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}},{"name":"ENV_LOCAL_NODE_IP","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"status.hostIP"}}},{"name":"ENV_LOCAL_NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"ENV_SERVICE_NAME","value":"rdma-test-gpu-tool"},{"name":"ENV_POD_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}}],"resources":{"limits":{"nvidia.com/gpu":"1"}},"readinessProbe":{"exec":{"command":["sh","-c","ls /tmp/ready"]},"timeoutSeconds":1,"periodSeconds":10,"successThreshold":1,"failureThreshold":3},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","nodeSelector":{"kubernetes.io/os":"linux"},"securityContext":{},"schedulerName":"default-scheduler","resourceClaims":[{"name":"demo","resourceClaimTemplateName":"demo"}]}},"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":1,"maxSurge":0}},"revisionHistoryLimit":10},"status":{"currentNumberScheduled":0,"numberMisscheduled":0,"desiredNumberScheduled":0,"numberReady":0}}
 >
daemonset.apps/demo-dra created
I0521 15:22:51.485960 3327853 apply.go:548] Running apply post-processor function

Note: I have already enabled the DRA feature-gate.

What did you expect to happen?

The pod can be created.

How can we reproduce it (as minimally and precisely as possible)?

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: demo-dra 
  namespace: cyclinder
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: rdma-test-gpu-tool
  template:
    metadata:
      labels:
        app: rdma-test-gpu-tool
    spec:
      containers:
      - env:
        - name: ENV_POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: ENV_LOCAL_NODE_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: ENV_LOCAL_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: ENV_SERVICE_NAME
          value: rdma-test-gpu-tool
        - name: ENV_POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8
        imagePullPolicy: IfNotPresent
        name: rdma-test
        ports:
        - containerPort: 22
          name: ssh
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - ls /tmp/ready
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            nvidia.com/gpu: 1
      nodeSelector:
        kubernetes.io/os: linux
      resourceClaims:
      - name: demo
        resourceClaimTemplateName: "demo"

Anything else we need to know?

  • kube-apiserver
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    ...
    - --feature-gates=DynamicResourceAllocation=true
    - --runtime-config=api/beta=true
  • kube-controller-manager:
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    ...
    - --feature-gates=DynamicResourceAllocation=true
    - --v=7
  • kube-scheduler
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-scheduler
    tier: control-plane
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-scheduler
    ...
    - --feature-gates=DynamicResourceAllocation=true

Kubernetes version

$ kubectl version
# paste output here
root@10-20-1-50:/home/guoqifeng/dra/manifests# kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.wg/device-managementCategorizes an issue or PR as relevant to WG Device Management.

    Type

    No type

    Projects

    Status

    🏗 In progress

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions