Pulse · triton-inference-server/server · GitHub

June 8, 2025 – June 15, 2025

Overview

5 Active pull requests

7 Active issues

3 Pull requests merged by 2 people

TPRD-1554: update readme and versions
#8244 merged Jun 11, 2025
Update default branch to track development for 2.60.0 / 25.07
#8243 merged Jun 11, 2025
ci: fix the trtllm tests after the repo migration of trtllm backend
#8241 merged Jun 9, 2025

2 Pull requests opened by 2 people

feat: Add guided decoding support to OpenAI frontend
#8245 opened Jun 11, 2025
docs: fix capitalization of Triton Inference Server
#8252 opened Jun 13, 2025

1 Issue closed by 1 person

Not loaded: No model version was found
#7420 closed Jun 12, 2025

6 Issues opened by 6 people

Spike in Failed Inference Requests During Triton Server Shutdown (gRPC Endpoint)
#8253 opened Jun 15, 2025
Real latency is much higher, queue time is high
#8251 opened Jun 12, 2025
Triton deploys CPU service without releasing memory usage
#8250 opened Jun 12, 2025
cpu-only docker base image is not available.
#8249 opened Jun 12, 2025
how can I build the tensorRT engine for tensorflow models with savedmodel formats?
#8242 opened Jun 11, 2025
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 40692
#8240 opened Jun 9, 2025

2 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

fix: Support max_completion_tokens option in OpenAI frontend
#8226 commented on Jun 12, 2025 • 1 new comment
GPU memory leak when loading/unloading models
#5841 commented on Jun 11, 2025 • 0 new comments