Improve the containerized deployment story #39

robertvazan · 2026-02-23T20:56:56Z

robertvazan
Feb 23, 2026

I tried to setup llms.py in a podman container and connect it to my local Ollama instances, but I have encountered several issues and the test drive ultimately failed despite a lot of effort. Let me describe the main pain points.

Entrypoint script: There's a prebuilt container image (ghcr.io/servicestack/llms:latest), which is good, and it asks for a single volume to save data, which is again good. The trouble is that the volume requires initialization. There seems to be some entrypoint script that handles volume initialization, but that means I cannot customize llms parameters like port and verbosity. At least my attempts to run the container with something like llms --serve 12345 --verbose have failed. Things only started working once I started relying on the built-in entrypoint script for volume initialization and generally proper server setup. I let llms.py run on its default port and expose a different port via podman configuration. I have no idea how to enable verbose logging now.

Logging: Not sure how llms.py writes logs, but I had to set log driver in podman to journald to see any logs, because nothing was logged under default settings. Even then the error log is no more detailed than the brief error I got in the UI. As explained above, I wasn't able to turn on verbose logging.

Baked-in configuration: There's no way to bake my custom Ollama endpoints into the container image or to otherwise supply them to llms.py automatically. I am expected to manually edit llms.json after the volume is initialized. In order to automate that, I would have to (1) run the service once and wait for the volume to be initialized, (2) read llms.json from the volume, (3) have a Python script patch in my Ollama endpoints, and (4) write the modified llms.json into the volume while the service is temporarily stopped. That's way too complicated. Why cannot we just drop a file with predefined configuration somewhere that llms.py would merge into its dynamic configuration? I have also found no way to disable tools by default in the config file.

Failing Ollama requests: So I have added one Ollama endpoint manually. Its models show up in the UI, but when I try to start a chat, I get [Errno None] Can not write request body for http://localhost:11436/v1/chat/completions error. Logs don't say anything more. It might be because it's a year old Intel IPEX fork of Ollama, which is based on an even older upstream. I will try to upgrade someday and try again. But then OpenAI endpoint is part of Ollama for ages. It should work even if the Ollama version is old.

Authentication: I gather there's some github auth extension, but that's an overkill and maybe a security problem of its own in a local setup. Why not a simple username/password? Without authentication, I wonder how much access to llms.py is granted to random websites I visit by llms.py's CORS configuration. Flatpak apps have full access to localhost ports too. Fronting by a reverse proxy is not an option on localhost.

Complexity: I have spent several hours exploring numerous blind alleys and asked ChatGPT like 20 different questions about various aspects of setting up llms.py. ChatGPT often had to dig in the source code for answers, probably due to insufficient documentation. This ought to be easier.

Anyways, I like what you are doing here and I hope llms.py will keep improving. For now, I just want to leave feedback from my test drive here. Feel free to discard any part that does not align with your goals.

mythz · 2026-02-24T01:41:38Z

mythz
Feb 24, 2026
Maintainer

Thanks for the feedback, I've moved this to a discussion since it's a better fit for general feedback - issues need to be clear and focused, describe a single, specific, actionable bug or feature request with enough detail to reproduce or implement. This post covers several different topics (entrypoint customization, logging, config management, authentication, Ollama compatibility) each of which would ideally be its own issue with sufficient context if you'd like to see specific changes made.

The docs for Docker is at https://llmspy.org/docs/deployment/docker which includes examples for how to run it, e.g:

docker run -p 8000:8000 -e OPENROUTER_API_KEY="your-key" ghcr.io/servicestack/llms:latest

To help manage multiple environment variables and other configuration using Docker Compose is recommended. You can also use environment variables for Verbose (VERBOSE=1) and Debug (DEBUG=1) logging. Logging currently only gets written to stdout.

It's not clear how you're attempting to run a custom commands like llms --serve 12345 --verbose, are you making your own custom Docker container?

There's no way to bake my custom Ollama endpoints into the container image or to otherwise supply them to llms.py automatically.

Not clear about this issue, it should be reading your llms.json from your volume, if it doesn't exist it has to create it, but you should be able to use your own custom one if it exists. You can start with the default llms.json from the repo and customize to suit. Where you should be able statically define only the models you want available with map_models otherwise it will attempt to load the models at runtime from your Ollama endpoint.

Not sure what's causing your Ollama error, try enable Verbose and Debug Logging to see if it can provide more context about the error.

Authentication

I plan on adding a simple credentials auth system after the major features I'm currently working on (Themes / Custom Agents). Feel free to watch that issue to get notified when it's available.

Complexity

Unfortunately general feedback like this that aren't actionable on their own - they don't say what specifically was confusing or what documentation was missing. If you ran into a specific step where the docs were wrong, unclear, or absent, that's something that can be looked at. Every project could be "easier" but without knowing the specific gaps, there's nothing concrete to improve.

As this is a new project I don't expect any of it to be in LLM training sets, I try to publish extensive documentation at https://llmspy.org/docs/ that should hopefully cover most use cases.

0 replies

robertvazan · 2026-02-24T02:33:50Z

robertvazan
Feb 24, 2026
Author

It's not clear how you're attempting to run a custom commands like llms --serve 12345 --verbose, are you making your own custom Docker container?

The container accepts commands. If no command is given, it will run as a server. But if I give it a command after the image name, say llms --serve 12345 --verbose, it seems to try to run it. I thought that's an intended feature.

2 replies

mythz Feb 24, 2026
Maintainer

My preference would be to use env vars for customizing how Docker containers are run. If there's a useful feature missing that requires custom CLI args, LMK and can look at adding ENV var option for it.

robertvazan Feb 24, 2026
Author

Port number is the one env variable missing. While docker/podman can remap the port, having the same port in the container is a good practice that makes for less confusing logs. I always change the port inside the container for all services I am running.

robertvazan · 2026-02-24T02:48:00Z

robertvazan
Feb 24, 2026
Author

There's no way to bake my custom Ollama endpoints into the container image or to otherwise supply them to llms.py automatically.

Not clear about this issue, it should be reading your llms.json from your volume, if it doesn't exist it has to create it, but you should be able to use your own custom one if it exists. You can start with the default llms.json from the repo and customize to suit.

This wouldn't do. The example is large and I would have to keep updating my starting llms.json as the project evolves. That's harder than manual edits. Secondly, this would be a one-time initialization persisted in the volume, so I would not be able to change the provider list just by rebuilding my container. Not good for software-defined environments.

Most server software reads immutable config files that can be baked into a container image or a VM by configuration/build scripts. That makes for straightforward setup that LLMs can handle well.

3 replies

mythz Feb 24, 2026
Maintainer

Not following, llms.json should rarely change, since llms.json inherits models.dev providers.json, only the providers.json is automatically updated, not llms.json. You can also use ~/.llms/providers-extra.json to add Model Providers missing from models.dev providers.json, which gets merged when providers.json is updated.

robertvazan Feb 24, 2026
Author

Just to clarify, changing "rarely" means changing too often when you want something immutable that you can easily set up and depend on. If you want to keep mutable llms.json, one workable approach would be to extend existing command-line options with an idempotent command to add new endpoint. People could then have scripts that run the llms CLI right after starting the container.

I tried to use providers-extra.json, but my understanding is that it lists models from models.dev, which are detached from specific endpoints. Endpoints then get configured in llms.json. At least that's how I understood it after examining content of the two files.

mythz Feb 24, 2026
Maintainer

You should be able to do everything you need with an extension. E.g. you can extend the CLI args with an extension parser(parser) hook and add custom endpoints within the __install__(ctx) hook. parser doesn't have context since it's executed immediately before one is created.

The extension docs also has a number of examples of adding custom endpoints with ctx.add_get, ctx.add_post, etc. The __install__ hook is also executed before the server starts.

robertvazan · 2026-02-24T03:01:55Z

robertvazan
Feb 24, 2026
Author

The docs for Docker is at https://llmspy.org/docs/deployment/docker

I had to adapt this a bit for my environment (podman, systemd quadlets, scripted setup), but I mostly followed it (missed the VERBOSE=1 though, thanks for the tip). One thing I noticed is that tricks like -v $(pwd)/my-llms.json:/home/llms/.llms/llms.json:ro do not really work. The server tries to write to llms.json. First it was to upgrade it (I didn't know about the version field), but it also wanted to write there when version field was present, probably to populate it with defaults (providers and such). This is one of the core issues I found. I am fighting for control over llms.json with the server.

2 replies

mythz Feb 24, 2026
Maintainer

llms.json is the configuration store for maintaining which providers are enabled, so things like --enable, --disable need to update it, or changing the --default model or --reset-config to reset it back to the latest config. Otherwise it shouldn't update it.

robertvazan Feb 24, 2026
Author

I think it writes it once when it initializes it with data from providers.json. Once is enough to break the service if the bind-mount is read-only at that time.

Improve the containerized deployment story #39

Uh oh!

robertvazan Feb 23, 2026

Replies: 4 comments · 7 replies

Uh oh!

Uh oh!

mythz Feb 24, 2026 Maintainer

Authentication

Complexity

Uh oh!

robertvazan Feb 24, 2026 Author

Uh oh!

mythz Feb 24, 2026 Maintainer

Uh oh!

robertvazan Feb 24, 2026 Author

Uh oh!

robertvazan Feb 24, 2026 Author

Uh oh!

Uh oh!

mythz Feb 24, 2026 Maintainer

Uh oh!

robertvazan Feb 24, 2026 Author

Uh oh!

mythz Feb 24, 2026 Maintainer

Uh oh!

robertvazan Feb 24, 2026 Author

Uh oh!

mythz Feb 24, 2026 Maintainer

Uh oh!

robertvazan Feb 24, 2026 Author

robertvazan
Feb 23, 2026

Replies: 4 comments 7 replies

mythz
Feb 24, 2026
Maintainer

robertvazan
Feb 24, 2026
Author

mythz Feb 24, 2026
Maintainer

robertvazan Feb 24, 2026
Author

robertvazan
Feb 24, 2026
Author

mythz Feb 24, 2026
Maintainer

robertvazan Feb 24, 2026
Author

mythz Feb 24, 2026
Maintainer

robertvazan
Feb 24, 2026
Author

mythz Feb 24, 2026
Maintainer

robertvazan Feb 24, 2026
Author