Skip to content

kube-scheduler does not consider hostPort ports used by initContainers when scheduling #132037

@avrittrohwer

Description

@avrittrohwer

What happened?

When running two pods that both use hostNetwork and have sidecar initContainers (restartPolicy: Always) using the same port, kube-scheduler allows the pods to be scheduled on the same node.

What did you expect to happen?

kube-scheduler does not schedule these two pods on the same node, following the behavior for pods using hostNetwork ports via a main container.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create a kind cluster: kind create cluster. The cluster only has one node.
  2. Apply p1:
    apiVersion: v1
    kind: Pod
    metadata:
      name: p1
    spec:              
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
      - name: init
        restartPolicy: Always    
        image: python:3.12                                                                                                                                                                                                            
        command:
        - /bin/bash
        - -c
        - python -m http.server 8081
        ports:
        - containerPort: 8081
      containers:
      - name: main
        image: python:3.12
        command:
        - /bin/bash
        - -c
        - sleep 10000
    
  3. Apply p2:
    apiVersion: v1
    kind: Pod
    metadata:
      name: p2
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      initContainers:
      - name: init
        restartPolicy: Always    
        image: python:3.12                                                                                                                                                                                                            
        command:
        - /bin/bash
        - -c
        - python -m http.server 8081
        ports:
        - containerPort: 8081
      containers:
      - name: main
        image: python:3.12
        command:
        - /bin/bash
        - -c
        - sleep 10000
    
  4. p2 is scheduled (and crashing because p1 is already using the port).

Anything else we need to know?

kube-scheduler already accounts for hostNetwork ports used by main containers. For example:

  1. Create a kind cluster: kind create cluster. The cluster only has one node.
  2. Apply p1:
    apiVersion: v1
    kind: Pod
    metadata:
      name: p1
    spec:              
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: main
        image: python:3.12
        command:
        - /bin/bash
        - -c
        - python -m http.server 8081
        ports:
        - containerPort: 8081
    
  3. Apply p2:
    apiVersion: v1
    kind: Pod
    metadata:
      name: p2
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: main
        image: python:3.12
        command:
        - /bin/bash
        - -c
        - python -m http.server 8081
        ports:
        - containerPort: 8081
    
  4. p2 is not scheduled: Warning FailedScheduling 3m7s default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

Kubernetes version

$ kubectl version
Client Version: v1.31.9
Kustomize Version: v5.4.2
Server Version: v1.31.1

Cloud provider

local kind cluster

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions