Skip to content

feat(medcat-trainer): Add OTEL tracing. Add docker build cache#364

Merged
alhendrickson merged 11 commits intomainfrom
feat/medcat-trainer/add-otel-tracing
Mar 10, 2026
Merged

feat(medcat-trainer): Add OTEL tracing. Add docker build cache#364
alhendrickson merged 11 commits intomainfrom
feat/medcat-trainer/add-otel-tracing

Conversation

@alhendrickson
Copy link
Collaborator

@alhendrickson alhendrickson commented Mar 9, 2026

What does this do?

This adds optional tracing to medcat-trainer

It makes a span for every API call recieved, and every request out that is made. Can optionally enable all the sql calls but it's super noisy.

I've also added more spans/detail/events for calls to SOLR, Medcat library, and medcat-service calls

Example

See calls to medcat-service. The trace propagates through so can see the underlying call to cat.get_entities is actually coming from the remote service
image

Call to SOLR - showing solr returns 404, but our code has a JSONDecodeError
image

@alhendrickson alhendrickson marked this pull request as ready for review March 10, 2026 11:27
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

A comment about all the dependencies. I.e since I don't think they're main dependencies, we may want to group them together as an optional extra. But not sure its worth it.

"django-health-check>=3.22",
"requests>=2.31",
"opentelemetry-sdk>=1.40.0",
"opentelemetry-distro[otlp]>=0.61b0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these perhaps be set up as an optional extra? I.e deep-telemetry or something? And if that's enabled, you get all those along with all the detailed telemetry instead of the basic ones?

That would - of course - involve setting up another env var for this, reading it, and changing the install command based on it in the Dockerfile - so may not be worth it.

Copy link
Collaborator Author

@alhendrickson alhendrickson Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey I get this idea, but I don't think it's really worth it it in this exact scenario

I 100% agree with doing this if this if this was a library. However here I kind of don't want to support trainer being installed in any other way outside of the docker image we build, and I'd rather not make different docker images for these deps. Looking around most things just bake it into the image (eg keycloak, solr, prefect), they all just have it ready to be turned on in the image). The libraries promise to be small and noop though

In future I'd like to add tracing into the medcat library itself, so when that happens then I'd definitely go with your idea and make them an optional dependency there. May go with langfuse & would really not want to bring that dep in for the 99.999% of people that dont need it.

Copy link
Collaborator

@mart-r mart-r Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main idea was to split into multiple images (with and without these extras). But you're right, it doesn't really make a lot of sense for us to manage multiple different types of images for the same thing (at least not at this time or for this specific reason).

With that said, we could still separate these to an optional extra. And since we control the way the image is built, we can just always include them.
Doesn't really change much other than just separating a bunch of these dependencies into a separate list. While this might be a little confusing as well, to me it would clear up the visible dependencies of the project. I.e the project doesn't necessarily depend on all of these. They're nice to haves, not necessities. We'll bake them in anyway, but it might be nice to separate the true dependencies (i.e something this wouldn't work without) from the nice-to-haves.

But this is still a matter of opinion. Having a separate list for packages that always get included anyway might also become a burden (i.e "which list do I add this new thing to?").

Copy link
Collaborator Author

@alhendrickson alhendrickson Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Yeah I see your point here, it makes a lot of sense to separate these a bit

I've updated here - I've put them into a different group (noting to include the deps directly imported), but still hardcoded in the image to include the extra ones. I'm thinking this is the mid path you're looking for

I've tried to follow https://opentelemetry.io/docs/concepts/instrumentation/libraries/#libraries-should-only-use-the-opentelemetry-api with the required deps. It says to bring in only opentelemetry-api at version 1.0.0 or higher. To note, this is likely to be the pattern followed if we were to also do this inside medcat, the 2 deps would probably be added for all users but they promise that they do nothing unless the opentelemetry-sdk is included too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to follow https://opentelemetry.io/docs/concepts/instrumentation/libraries/#libraries-should-only-use-the-opentelemetry-api with the required deps. It says to bring in only opentelemetry-api at version 1.0.0 or higher. To note, this is likely to be the pattern followed if we were to also do this inside medcat, the 2 deps would probably be added for all users but they promise that they do nothing unless the opentelemetry-sdk is included too.

This might not be as relevant here as this is trainer - it's not a library someone else will install. But certainly will be for future MedCAT use cases.

Looks good to me overall!

torch = { index = "pytorch-cpu" }

[[tool.uv.index]]
name = "pytorch-cpu"
Copy link
Collaborator Author

@alhendrickson alhendrickson Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference - This was added to fix this error. Open to this being the wrong approach, but this does fix it at least. Can't see a wheel being used in medcat-v2 to best understand what the right pattern is for it - but it feels reasonable to have to opt in to using that index, instead of eveyrthing trying to use it by default

(medcattrainer-webapp) vscode ➜ .../CogStack/cogstack-nlp/medcat-trainer/webapp (main) $ uv add opentelemetry-sdk
Resolved 105 packages in 2.68s
error: Distribution `markupsafe==3.0.3 @ registry+https://download.pytorch.org/whl/cpu/` can't be installed because it doesn't have a source distribution or wheel for the current platform

hint: You're using CPython 3.13 (`cp313`), but `markupsafe` (v3.0.3) only has wheels with the following Python implementation tag: `cp314`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trainer is unlikely to be running in GPU enabled situations (at least for now). Plus, this will also make the image smaller since it won't be installing all the CUDA stuff.

With that said, it's one of those things I've seen before. Where uv picks a version for you to use that's not actually usable in your environment. If I recall, this is because uv (in order to be as quick as it is at resolving dependencies) will skip some (many? not sure) checks regarding specific versions that it picks for usage.

To me, it feels like a lot of newer python stuff (like uv) is built around people who are always on the latest python version. E.g when you build a new website from scratch that depends on very few things other than your web framework. And for those this will not be an issue. But because we depend on medcat, which in turn depends on a dozen other things, which brings in a lot of transitive dependencies as well, we end up a few versions behind on python.

@alhendrickson alhendrickson merged commit b387581 into main Mar 10, 2026
10 checks passed
@alhendrickson alhendrickson deleted the feat/medcat-trainer/add-otel-tracing branch March 10, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants