Add micro batching and enpoints for v1 list_models and get_model by baixiac · Pull Request #41 · CogStack/CogStack-ModelServe

baixiac · 2026-03-06T17:08:05Z

feat: add enpoints for v1 models and list models
feat: add micro batching and lower CPU usage during model loading
feat: ensure the pad token for generative models
feat: use the async streamer during async generation
feat: apply timeout to text generation
fix: fix the property name for stop sequences in OpenAI requests
docker: add the GPU image build and remove per-model
chore: upgrade uv and tidy up the docker folder

feat: add micro batching and lower CPU usage during model loading feat: ensure the pad token for generative models feat: use the async streamer during async generation feat: apply timeout to text generation fix: fix the property name for stop sequences in OpenAI requests

baixiac force-pushed the llm-gen2 branch from 19fd6b0 to 4a9da58 Compare March 9, 2026 15:04

docker: add the GPU image build and remove per-model Dockerfiles

3823b76

baixiac force-pushed the llm-gen2 branch 2 times, most recently from 78b021e to 7f7c36f Compare March 10, 2026 10:48

chore: upgrade uv and tidy up the docker folder

184832d

baixiac force-pushed the llm-gen2 branch from 7f7c36f to 184832d Compare March 10, 2026 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add micro batching and enpoints for v1 list_models and get_model#41

Add micro batching and enpoints for v1 list_models and get_model#41
baixiac wants to merge 3 commits intomainfrom
llm-gen2

baixiac commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baixiac commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

baixiac commented Mar 6, 2026 •

edited

Loading