Skip to content

fix: Support max_completion_tokens option in OpenAI frontend #8226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

pskiran1
Copy link
Member

@pskiran1 pskiran1 commented May 30, 2025

What does the PR do?

  • Added support for max_completion_tokens.
  • Did not remove max_tokens support for the time being to maintain backward compatibility.
  • Added test cases.

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID: TBD

Caveats:

Background

As mentioned in the API documentation, max_tokens is deprecated, and it is recommended to use max_completion_tokens instead. https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

@pskiran1 pskiran1 added PR: fix A bug fix openai OpenAI related labels Jun 4, 2025
@pskiran1 pskiran1 changed the title Support max_completion_tokens option in OpenAI frontend fix: Support max_completion_tokens option in OpenAI frontend Jun 4, 2025
@pskiran1 pskiran1 marked this pull request as ready for review June 5, 2025 08:43
max_tokens: Optional[conint(ge=0)] = Field(
16,
max_completion_tokens: Optional[conint(ge=0)] = Field(
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will trtllm backend or vllm backend has a default number for this? What's the behavior of if we leave max_tokens and max_completion_tokens both as None? Will it only generating one token? If so, I think it's better to have a default value here so that at least the user can get something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vLLM, there is no issue since the framework has a default value.
However, for TRT-LLM, the max_tokens field is mandatory, so we need to establish a default value.

If I am not wrong, configuring the default value in the schema may create confusion in identifying whether the user provided max_tokens or max_completion_tokens, as both fields would always have values. To avoid this ambiguity and ensure that the request field is correctly identified, I moved the default value for max_tokens to the command line arguments. commit: 5e2756e
This change also allows users to customize the default value when starting the server.

Please let me know if this approach is feasible or if you have any suggestions.
Thank you.
cc: @rmccorm4

richardhuo-nv
richardhuo-nv previously approved these changes Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openai OpenAI related PR: fix A bug fix
Development

Successfully merging this pull request may close these issues.

3 participants