Skip to content

Add micro batching and enpoints for v1 list_models and get_model#41

Open
baixiac wants to merge 3 commits intomainfrom
llm-gen2
Open

Add micro batching and enpoints for v1 list_models and get_model#41
baixiac wants to merge 3 commits intomainfrom
llm-gen2

Conversation

@baixiac
Copy link
Member

@baixiac baixiac commented Mar 6, 2026

feat: add enpoints for v1 models and list models
feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests
docker: add the GPU image build and remove per-model
chore: upgrade uv and tidy up the docker folder

feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests
@baixiac baixiac force-pushed the llm-gen2 branch 2 times, most recently from 78b021e to 7f7c36f Compare March 10, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant