PR -> feat: bind MLflow active run in worker threads for OpenTrace integration by mjehanzaib999 · Pull Request #11 · AgentOpt/Trace-Bench

mjehanzaib999 · 2026-03-16T17:17:07Z

Summary

This PR adds MLflow active-run context propagation support for Trace-Bench worker threads, enabling seamless integration with the OpenTrace unified telemetry pipeline introduced in microsoft/Trace PR #64 (Milestone 2).

Problem

When Trace-Bench runs evaluation jobs with max_workers > 1, each worker thread loses access to MLflow's thread-local active run state. This causes telemetry spans emitted by OpenTrace's unified TelemetrySession to either:

Land outside the parent MLflow run (orphaned spans)
Fail silently, losing valuable optimization telemetry data

Solution

E1 — bind_active_run() context manager (trace_bench/integrations/mlflow_client.py)

Captures the MLflow active-run reference from the main thread
Re-attaches it in worker threads via a lightweight context manager
Safe no-op when MLflow is not configured or no active run exists

E2 — Runner integration (trace_bench/runner.py)

Wraps _run_job() invocations with bind_active_run(mlflow_ctx) when workers > 1
Zero overhead in single-threaded mode — context binding is skipped entirely
Backward compatible — existing runner behavior unchanged when MLflow is disabled

E3 — Notebook documentation (notebooks/03_ui_launch_monitor.ipynb)

Updated to reflect the OpenTrace unified telemetry integration path
Documents how MLflow runs correlate with OTEL spans in multi-worker scenarios

How it works

# Main thread captures context
mlflow_ctx = get_active_run_context()

# Worker thread re-attaches it
with bind_active_run(mlflow_ctx):
    # All MLflow/OTEL calls inside here see the correct parent run
    run_evaluation_job(...)

Relationship to other PRs

PR	Repo	Purpose
microsoft/Trace #64	microsoft/Trace	M2 unified telemetry (TelemetrySession, OTEL, TGJ)
This PR	AgentOpt/Trace-Bench	Worker-thread MLflow context propagation

This PR depends on the unified TelemetrySession API from Trace #64 but can be merged independently — bind_active_run() is a no-op when the telemetry session is not active.

Test plan

bind_active_run() correctly propagates run context in threaded execution
Runner works with max_workers > 1 and MLflow enabled
Runner works with max_workers = 1 (no regression)
Runner works with MLflow disabled (no regression)
Notebook 03 renders correctly with updated documentation
No import errors when MLflow is not installed

E1: Add bind_active_run() context manager to mlflow_client.py — re-attaches MLflow active-run state in worker threads so OpenTrace unified telemetry spans land under the correct parent run. E2: Wrap _run_job() call in runner.py with bind_active_run(mlflow_ctx) for max_workers > 1 scenarios. E3: Update notebook 03 docs to reflect OpenTrace unified telemetry integration path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR -> feat: bind MLflow active run in worker threads for OpenTrace integration#11

PR -> feat: bind MLflow active run in worker threads for OpenTrace integration#11
mjehanzaib999 wants to merge 1 commit intoAgentOpt:mainfrom
mjehanzaib999:jz/feat/unified-telemetry-integration

mjehanzaib999 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mjehanzaib999 commented Mar 16, 2026

Summary

Problem

Solution

How it works

Relationship to other PRs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants