fix: Support max_completion_tokens option in OpenAI frontend #8226

pskiran1 · 2025-05-30T12:26:40Z

What does the PR do?

Added support for max_completion_tokens.
Did not remove max_tokens support for the time being to maintain backward compatibility.
Added test cases.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

Related PRs:

Where should the reviewer start?

Test plan:

CI Pipeline ID: TBD

Caveats:

Background

As mentioned in the API documentation, max_tokens is deprecated, and it is recommended to use max_completion_tokens instead. https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to GitHub issue: Problems with the response of the OpenAI-Compatible Frontend for Triton Inference Server #7796

into spolisetty_max_tokens_openai

python/openai/tests/test_chat_completions.py

richardhuo-nv · 2025-06-05T16:48:31Z

python/openai/openai_frontend/schemas/openai.py

-    max_tokens: Optional[conint(ge=0)] = Field(
-        16,
+    max_completion_tokens: Optional[conint(ge=0)] = Field(
+        None,


Will trtllm backend or vllm backend has a default number for this? What's the behavior of if we leave max_tokens and max_completion_tokens both as None? Will it only generating one token? If so, I think it's better to have a default value here so that at least the user can get something.

For vLLM, there is no issue since the framework has a default value.
However, for TRT-LLM, the max_tokens field is mandatory, so we need to establish a default value.

If I am not wrong, configuring the default value in the schema may create confusion in identifying whether the user provided max_tokens or max_completion_tokens, as both fields would always have values. To avoid this ambiguity and ensure that the request field is correctly identified, I moved the default value for max_tokens to the command line arguments. commit: 5e2756e
This change also allows users to customize the default value when starting the server.

Please let me know if this approach is feasible or if you have any suggestions.
Thank you.
cc: @rmccorm4

python/openai/openai_frontend/schemas/openai.py

into spolisetty_max_tokens_openai

…ton-inference-server/server into spolisetty_max_tokens_openai

pskiran1 added 7 commits May 30, 2025 17:54

Add support for max_completion_tokens

fe35093

Fix pre-commit

cdfeea9

Update

07fc962

Update

4764e7e

Update

8df79a9

Update

89f4453

Update

1421e63

pskiran1 added PR: fix A bug fix openai OpenAI related labels Jun 4, 2025

pskiran1 changed the title ~~Support max_completion_tokens option in OpenAI frontend~~ fix: Support max_completion_tokens option in OpenAI frontend Jun 4, 2025

Merge branch 'main' of https://github.com/triton-inference-server/server

ccc5033

into spolisetty_max_tokens_openai

pskiran1 marked this pull request as ready for review June 5, 2025 08:43

pskiran1 requested review from richardhuo-nv and rmccorm4 June 5, 2025 08:45

rmccorm4 reviewed Jun 5, 2025

View reviewed changes

python/openai/tests/test_chat_completions.py Outdated Show resolved Hide resolved

richardhuo-nv reviewed Jun 5, 2025

View reviewed changes

pskiran1 added 3 commits June 8, 2025 11:57

Update

b4116f5

Updae

5e2756e

Pre-commit

b4ef3cd

pskiran1 requested review from rmccorm4 and richardhuo-nv June 9, 2025 17:47

richardhuo-nv previously approved these changes Jun 9, 2025

View reviewed changes

pskiran1 added 3 commits June 12, 2025 20:30

Merge branch 'main' of https://github.com/triton-inference-server/server

e9dc9fa

into spolisetty_max_tokens_openai

Fix CI

5fe28ac

Merge branch 'spolisetty_max_tokens_openai' of https://github.com/tri…

7a78b1b

…ton-inference-server/server into spolisetty_max_tokens_openai

pskiran1 dismissed richardhuo-nv’s stale review via 7a78b1b June 12, 2025 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Support max_completion_tokens option in OpenAI frontend #8226

fix: Support max_completion_tokens option in OpenAI frontend #8226

Uh oh!

pskiran1 commented May 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

richardhuo-nv Jun 5, 2025

Uh oh!

pskiran1 Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

fix: Support max_completion_tokens option in OpenAI frontend #8226

Are you sure you want to change the base?

fix: Support max_completion_tokens option in OpenAI frontend #8226

Uh oh!

Conversation

pskiran1 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Uh oh!

richardhuo-nv Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

pskiran1 Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pskiran1 commented May 30, 2025 •

edited

Loading