-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: Support max_completion_tokens option in OpenAI frontend #8226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
max_tokens: Optional[conint(ge=0)] = Field( | ||
16, | ||
max_completion_tokens: Optional[conint(ge=0)] = Field( | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will trtllm backend or vllm backend has a default number for this? What's the behavior of if we leave max_tokens and max_completion_tokens both as None? Will it only generating one token? If so, I think it's better to have a default value here so that at least the user can get something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For vLLM
, there is no issue since the framework has a default value.
However, for TRT-LLM
, the max_tokens
field is mandatory, so we need to establish a default value.
If I am not wrong, configuring the default value in the schema may create confusion in identifying whether the user provided max_tokens
or max_completion_tokens
, as both fields would always have values. To avoid this ambiguity and ensure that the request field is correctly identified, I moved the default value for max_tokens
to the command line arguments. commit: 5e2756e
This change also allows users to customize the default value when starting the server.
Please let me know if this approach is feasible or if you have any suggestions.
Thank you.
cc: @rmccorm4
…ton-inference-server/server into spolisetty_max_tokens_openai
What does the PR do?
max_completion_tokens
.max_tokens
support for the time being to maintain backward compatibility.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
As mentioned in the API documentation,
max_tokens
is deprecated, and it is recommended to usemax_completion_tokens
instead. https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokensRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)