Skip to content

QHint: Add effectiveness metrics and per-plugin control #132093

Open
@utam0k

Description

@utam0k

What would you like to be added?

Now that QueueingHint is GA (#131973), I'd like to work on a QHint implementation for my plugins, but before I do so, the following improvements could be made. What do you think?

I'd like to propose operational improvements:

  • Metrics to measure hint effectiveness
    queueing_hint_decisions_total{plugin="...", event="...", hint="Queue|QueueSkip"}
    I'm considering whether to include scheduling outcomes:
    queueing_hint_effectiveness_total{plugin="...", event="...", hint="Queue", outcome="scheduled|failed"}

  • Configuration to disable hints per plugin

  profiles:
  - schedulerName: default-scheduler
    disabledQueueingHints: ["NodeResourcesFit"]

Why is this needed?

  • Metrics: Identify which plugins provide accurate hints
  • Control: Disable hints for problematic plugins without code changes
  • Debugging: Isolate issues to specific plugins

Open Questions

Should we track scheduling outcomes to measure false positives (Queue→failed) and false negatives (QueueSkip→could have succeeded)?

/sig-scheduling

cc: @sanposhiho @macsko 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-sigIndicates an issue or PR lacks a `sig/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions