Open
Description
What would you like to be added?
Now that QueueingHint is GA (#131973), I'd like to work on a QHint implementation for my plugins, but before I do so, the following improvements could be made. What do you think?
I'd like to propose operational improvements:
-
Metrics to measure hint effectiveness
queueing_hint_decisions_total{plugin="...", event="...", hint="Queue|QueueSkip"}
I'm considering whether to include scheduling outcomes:
queueing_hint_effectiveness_total{plugin="...", event="...", hint="Queue", outcome="scheduled|failed"}
-
Configuration to disable hints per plugin
profiles:
- schedulerName: default-scheduler
disabledQueueingHints: ["NodeResourcesFit"]
Why is this needed?
- Metrics: Identify which plugins provide accurate hints
- Control: Disable hints for problematic plugins without code changes
- Debugging: Isolate issues to specific plugins
Open Questions
Should we track scheduling outcomes to measure false positives (Queue→failed) and false negatives (QueueSkip→could have succeeded)?
/sig-scheduling
cc: @sanposhiho @macsko 🙏