DRA: Pod termination is stuck when DRA Driver is stopped

### What happened?

A status of Pod allocated some device remains as terminating when the DRA Driver is stopped.
I don't know this is intentional or a bug.

### What did you expect to happen?

A Pod is completely terminated.

### How can we reproduce it (as minimally and precisely as possible)?

We can reproduce it using [dra-example-driver](https://github.com/kubernetes-sigs/dra-example-driver).

### Summary
1. Install the `DRA Driver`(dra-example-driver) and create a `DeviceClass`.
1. Create a `ResourceClaimTemplate`.
1. Deploy a Pod allocated some device via the `ResourceClaimTemplate`.
1. Stop the DRA Driver.
1. Delete the Pod.
1. The Pod remains as terminating.

### Procedure

<details>

Install the `DRA Driver` and create a `DeviceClass` by following [dra-example-driver demo](https://github.com/kubernetes-sigs/dra-example-driver?tab=readme-ov-file#demo).

Create a `ResourceClaimTemplate`.

`resource-claim-template-0.yaml`
```yaml
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
  name: single-gpu
spec:
  spec:
    devices:
      requests:
      - name: gpu
        deviceClassName: gpu.example.com
```
```bash
$ kubectl apply -f resource-claim-template-0.yaml
```

Deploy a Pod allocated some device via the `ResourceClaimTemplate`.

`sample-pod-0.yaml`
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: sample-pod-0
  labels:
    app: sample-pod-0
spec:
  containers:
  - name: ctr0
    image: ubuntu:22.04
    command: ["bash", "-c"]
    args: ["export; trap 'exit 0' TERM; sleep 9999 & wait"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimTemplateName: single-gpu
```

```bash
$ kubectl apply -f sample-pod-0.yaml
```


The current status of cluster is as follows.
```bash
$ kubectl get deviceclasses,resourceclaimtemplates,resourceclaim,pods
NAME                                          AGE
deviceclass.resource.k8s.io/gpu.example.com   54m

NAME                                               AGE
resourceclaimtemplate.resource.k8s.io/single-gpu   85s

NAME                                                   STATE                AGE
resourceclaim.resource.k8s.io/sample-pod-0-gpu-hxsn7   allocated,reserved   77s

NAME               READY   STATUS    RESTARTS   AGE
pod/sample-pod-0   1/1     Running   0          77s
```

Stop the DRA Driver.
In this case, we can uninstall the `dra-example-driver` via helm.
```bash
$ helm -n dra-example-driver uninstall dra-example-driver
```

Delete the Pod and the status remains as terminating.
```bash
$ kubectl delete po sample-pod-0
pod "sample-pod-0" deleted
(stucking...)

$ kubectl get pod
NAME           READY   STATUS        RESTARTS   AGE
sample-pod-0   0/1     Terminating   0          17m
```

</details>

### Anything else we need to know?

The kubelet log shows the following error. 
```bash
# journalctl -xu kubelet
...
Dec 26 05:35:41 kind-v1.32.0-worker kubelet[231]: I1226 05:35:41.323475     231 kubelet.go:2490] "SyncLoop DELETE" source="api" pods=["default/sample-pod-0"]
Dec 26 05:35:41 kind-v1.32.0-worker kubelet[231]: I1226 05:35:41.323611     231 kuberuntime_container.go:809] "Killing container with a grace period" pod="default/sample-pod-0" podUID="5598e74f-08ff-40ab-aba3-fa811874f9dc" containerName="ctr0" containerID="containerd://f42f59f5684ff9be1a0ee57d70231456530e40e87e652ddf0b72c7639a544a05" gracePeriod=30
Dec 26 05:35:41 kind-v1.32.0-worker kubelet[231]: E1226 05:35:41.415637     231 pod_workers.go:1301] "Error syncing pod, skipping" err="get gRPC client for DRA driver gpu.example.com: plugin name gpu.example.com not found in the list of registered DRA plugins" pod="default/sample-pod-0" podUID="5598e74f-08ff-40ab-aba3-fa811874f9dc"
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: I1226 05:35:42.032843     231 generic.go:358] "Generic (PLEG): container finished" podID="5598e74f-08ff-40ab-aba3-fa811874f9dc" containerID="f42f59f5684ff9be1a0ee57d70231456530e40e87e652ddf0b72c7639a544a05" exitCode=0
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: I1226 05:35:42.032885     231 kubelet.go:2506] "SyncLoop (PLEG): event for pod" pod="default/sample-pod-0" event={"ID":"5598e74f-08ff-40ab-aba3-fa811874f9dc","Type":"ContainerDied","Data":"f42f59f5684ff9be1a0ee57d70231456530e40e87e652ddf0b72c7639a544a05"}
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: I1226 05:35:42.032901     231 kubelet.go:2506] "SyncLoop (PLEG): event for pod" pod="default/sample-pod-0" event={"ID":"5598e74f-08ff-40ab-aba3-fa811874f9dc","Type":"ContainerDied","Data":"220eab5b9bc2277f4ac7777dabb82a92bee9d3d17bfa42f92d204cb9d85c936d"}
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: I1226 05:35:42.032907     231 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="220eab5b9bc2277f4ac7777dabb82a92bee9d3d17bfa42f92d204cb9d85c936d"
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: I1226 05:35:42.032934     231 util.go:48] "No ready sandbox for pod can be found. Need to start a new one" pod="default/sample-pod-0"
Dec 26 05:35:42 kind-v1.32.0-worker kubelet[231]: E1226 05:35:42.044794     231 pod_workers.go:1301] "Error syncing pod, skipping" err="get gRPC client for DRA driver gpu.example.com: plugin name gpu.example.com not found in the list of registered DRA plugins" pod="default/sample-pod-0" podUID="5598e74f-08ff-40ab-aba3-fa811874f9dc"
Dec 26 05:35:53 kind-v1.32.0-worker kubelet[231]: I1226 05:35:53.469847     231 util.go:48] "No ready sandbox for pod can be found. Need to start a new one" pod="default/sample-pod-0"
Dec 26 05:35:53 kind-v1.32.0-worker kubelet[231]: E1226 05:35:53.482748     231 pod_workers.go:1301] "Error syncing pod, skipping" err="get gRPC client for DRA driver gpu.example.com: plugin name gpu.example.com not found in the list of registered DRA plugins" pod="default/sample-pod-0" podUID="5598e74f-08ff-40ab-aba3-fa811874f9dc"
Dec 26 05:36:08 kind-v1.32.0-worker kubelet[231]: I1226 05:36:08.468746     231 util.go:48] "No ready sandbox for pod can be found. Need to start a new one" pod="default/sample-pod-0"
Dec 26 05:36:08 kind-v1.32.0-worker kubelet[231]: E1226 05:36:08.480012     231 pod_workers.go:1301] "Error syncing pod, skipping" err="get gRPC client for DRA driver gpu.example.com: plugin name gpu.example.com not found in the list of registered DRA plugins" pod="default/sample-pod-0" podUID="5598e74f-08ff-40ab-aba3-fa811874f9dc"
```

### Kubernetes version

<details>

```console
$ kubectl version
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.0
```

</details>


### Cloud provider

<details>
none
</details>


### OS version

<details>

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
```

</details>


### Install tools

<details>

```console
$ kind version
kind v0.26.0 go1.23.4 linux/amd64
```

</details>


### Container runtime (CRI) and version (if applicable)

<details>

</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DRA: Pod termination is stuck when DRA Driver is stopped #129402

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Summary

Procedure

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DRA: Pod termination is stuck when DRA Driver is stopped #129402

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Summary

Procedure

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions