Open
Description
What happened?
I created a daemonset with resourceClaims configured, but the pod can't be created, the KCM always complains "must specify one of: resourceClaimName
, resourceClaimTemplateName
".
I0521 07:08:36.544760 1 event.go:389] "Event occurred" logger="daemonset-controller" object="cyclinder/demo-dra" fieldPath="" kind="DaemonSet" apiVersion="apps/v1" type="Warning" reason="FailedCreate" message="Error creating: Pod \"demo-dra-tz69j\" is invalid: spec.resourceClaims[0]: Invalid value: core.PodResourceClaim{Name:\"demo\", ResourceClaimName:(*string)(nil), ResourceClaimTemplateName:(*string)(nil)}: must specify one of: `resourceClaimName`, `resourceClaimTemplateName`"
E0521 07:08:36.546024 1 daemon_controller.go:329] "Unhandled Error" err="cyclinder/demo-dra failed with : Pod \"demo-dra-tz69j\" is invalid: spec.resourceClaims[0]: Invalid value: core.PodResourceClaim{Name:\"demo\", ResourceClaimName:(*string)(nil), ResourceClaimTemplateName:(*string)(nil)}: must specify one of: `resourceClaimName`, `resourceClaimTemplateName`" logger="UnhandledError"
I used kubectl apply -f deploy.yaml -v 9
and see the resourceClaims have been submitted. what happened here?
...
I0521 15:22:51.476955 3327853 round_trippers.go:473] curl -v -XPOST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: kubectl/v1.32.3 (linux/amd64) kubernetes/32cc146" 'https://10.20.1.50:6443/apis/apps/v1/namespaces/cyclinder/daemonsets?fieldManager=kubectl-client-side-apply&fieldValidation=Strict'
I0521 15:22:51.484979 3327853 round_trippers.go:560] POST https://10.20.1.50:6443/apis/apps/v1/namespaces/cyclinder/daemonsets?fieldManager=kubectl-client-side-apply&fieldValidation=Strict 201 Created in 7 milliseconds
I0521 15:22:51.485039 3327853 round_trippers.go:577] HTTP Statistics: GetConnection 0 ms ServerProcessing 7 ms Duration 7 ms
I0521 15:22:51.485055 3327853 round_trippers.go:584] Response Headers:
I0521 15:22:51.485075 3327853 round_trippers.go:587] Audit-Id: 1b97801f-f466-46cb-877f-188e1b800d6e
I0521 15:22:51.485088 3327853 round_trippers.go:587] Cache-Control: no-cache, private
I0521 15:22:51.485100 3327853 round_trippers.go:587] Content-Type: application/json
I0521 15:22:51.485113 3327853 round_trippers.go:587] X-Kubernetes-Pf-Flowschema-Uid: c170b105-c632-487e-800c-2b7e48b70984
I0521 15:22:51.485121 3327853 round_trippers.go:587] X-Kubernetes-Pf-Prioritylevel-Uid: b1672c8d-17a9-49ac-858a-633e8682d346
I0521 15:22:51.485131 3327853 round_trippers.go:587] Date: Wed, 21 May 2025 07:22:51 GMT
I0521 15:22:51.485337 3327853 helper.go:246] "Response Body" body=<
{"kind":"DaemonSet","apiVersion":"apps/v1","metadata":{"name":"demo-dra","namespace":"cyclinder","uid":"337f0574-05cc-4004-987f-6706e4bf75a0","resourceVersion":"207473166","generation":1,"creationTimestamp":"2025-05-21T07:22:51Z","annotations":{"deprecated.daemonset.template.generation":"1","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"apps/v1\",\"kind\":\"DaemonSet\",\"metadata\":{\"annotations\":{},\"name\":\"demo-dra\",\"namespace\":\"cyclinder\"},\"spec\":{\"revisionHistoryLimit\":10,\"selector\":{\"matchLabels\":{\"app\":\"rdma-test-gpu-tool\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"rdma-test-gpu-tool\"}},\"spec\":{\"containers\":[{\"env\":[{\"name\":\"ENV_POD_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.name\"}}},{\"name\":\"ENV_LOCAL_NODE_IP\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"status.hostIP\"}}},{\"name\":\"ENV_LOCAL_NODE_NAME\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"spec.nodeName\"}}},{\"name\":\"ENV_SERVICE_NAME\",\"value\":\"rdma-test-gpu-tool\"},{\"name\":\"ENV_POD_NAMESPACE\",\"valueFrom\":{\"fieldRef\":{\"apiVersion\":\"v1\",\"fieldPath\":\"metadata.namespace\"}}}],\"image\":\"ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8\",\"imagePullPolicy\":\"IfNotPresent\",\"name\":\"rdma-test\",\"ports\":[{\"containerPort\":22,\"name\":\"ssh\",\"protocol\":\"TCP\"}],\"readinessProbe\":{\"exec\":{\"command\":[\"sh\",\"-c\",\"ls /tmp/ready\"]},\"failureThreshold\":3,\"periodSeconds\":10,\"successThreshold\":1,\"timeoutSeconds\":1},\"resources\":{\"limits\":{\"nvidia.com/gpu\":1}}}],\"nodeSelector\":{\"kubernetes.io/os\":\"linux\"},\"resourceClaims\":[{\"name\":\"demo-nri\",\"resourceClaimTemplateName\":\"demo-nri\"}]}}}}\n"},"managedFields":[{"manager":"kubectl-client-side-apply","operation":"Update","apiVersion":"apps/v1","time":"2025-05-21T07:22:51Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:deprecated.daemonset.template.generation":{},"f:kubectl.kubernetes.io/last-applied-configuration":{}}},"f:spec":{"f:revisionHistoryLimit":{},"f:selector":{},"f:template":{"f:metadata":{"f:labels":{".":{},"f:app":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"rdma-test\"}":{".":{},"f:env":{".":{},"k:{\"name\":\"ENV_LOCAL_NODE_IP\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_LOCAL_NODE_NAME\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_POD_NAME\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_POD_NAMESPACE\"}":{".":{},"f:name":{},"f:valueFrom":{".":{},"f:fieldRef":{}}},"k:{\"name\":\"ENV_SERVICE_NAME\"}":{".":{},"f:name":{},"f:value":{}}},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:ports":{".":{},"k:{\"containerPort\":22,\"protocol\":\"TCP\"}":{".":{},"f:containerPort":{},"f:name":{},"f:protocol":{}}},"f:readinessProbe":{".":{},"f:exec":{".":{},"f:command":{}},"f:failureThreshold":{},"f:periodSeconds":{},"f:successThreshold":{},"f:timeoutSeconds":{}},"f:resources":{".":{},"f:limits":{".":{},"f:nvidia.com/gpu":{}}},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:nodeSelector":{},"f:resourceClaims":{".":{},"k:{\"name\":\"demo-nri\"}":{".":{},"f:name":{},"f:resourceClaimTemplateName":{}}},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}},"f:updateStrategy":{"f:rollingUpdate":{".":{},"f:maxSurge":{},"f:maxUnavailable":{}},"f:type":{}}}}}]},"spec":{"selector":{"matchLabels":{"app":"rdma-test-gpu-tool"}},"template":{"metadata":{"creationTimestamp":null,"labels":{"app":"rdma-test-gpu-tool"}},"spec":{"containers":[{"name":"rdma-test","image":"ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8","ports":[{"name":"ssh","containerPort":22,"protocol":"TCP"}],"env":[{"name":"ENV_POD_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.name"}}},{"name":"ENV_LOCAL_NODE_IP","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"status.hostIP"}}},{"name":"ENV_LOCAL_NODE_NAME","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"spec.nodeName"}}},{"name":"ENV_SERVICE_NAME","value":"rdma-test-gpu-tool"},{"name":"ENV_POD_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}}],"resources":{"limits":{"nvidia.com/gpu":"1"}},"readinessProbe":{"exec":{"command":["sh","-c","ls /tmp/ready"]},"timeoutSeconds":1,"periodSeconds":10,"successThreshold":1,"failureThreshold":3},"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","nodeSelector":{"kubernetes.io/os":"linux"},"securityContext":{},"schedulerName":"default-scheduler","resourceClaims":[{"name":"demo","resourceClaimTemplateName":"demo"}]}},"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"maxUnavailable":1,"maxSurge":0}},"revisionHistoryLimit":10},"status":{"currentNumberScheduled":0,"numberMisscheduled":0,"desiredNumberScheduled":0,"numberReady":0}}
>
daemonset.apps/demo-dra created
I0521 15:22:51.485960 3327853 apply.go:548] Running apply post-processor function
Note: I have already enabled the DRA feature-gate.
What did you expect to happen?
The pod can be created.
How can we reproduce it (as minimally and precisely as possible)?
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: demo-dra
namespace: cyclinder
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: rdma-test-gpu-tool
template:
metadata:
labels:
app: rdma-test-gpu-tool
spec:
containers:
- env:
- name: ENV_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: ENV_LOCAL_NODE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: ENV_LOCAL_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: ENV_SERVICE_NAME
value: rdma-test-gpu-tool
- name: ENV_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: ghcr.m.daocloud.io/spidernet-io/rdma-tools:12.5.1-898cf75813bf866d1ba576ce7484065c0fd237e8
imagePullPolicy: IfNotPresent
name: rdma-test
ports:
- containerPort: 22
name: ssh
protocol: TCP
readinessProbe:
exec:
command:
- sh
- -c
- ls /tmp/ready
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
kubernetes.io/os: linux
resourceClaims:
- name: demo
resourceClaimTemplateName: "demo"
Anything else we need to know?
- kube-apiserver
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
...
- --feature-gates=DynamicResourceAllocation=true
- --runtime-config=api/beta=true
- kube-controller-manager:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
...
- --feature-gates=DynamicResourceAllocation=true
- --v=7
- kube-scheduler
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
...
- --feature-gates=DynamicResourceAllocation=true
Kubernetes version
$ kubectl version
# paste output here
root@10-20-1-50:/home/guoqifeng/dra/manifests# kubectl version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
🏗 In progress
Status
Triage