-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add a built-in --export-otel CLI flag that exports AgentV evaluation traces to any OpenTelemetry-compatible backend (Langfuse, Braintrust, Confident AI, Jaeger, Grafana Tempo, Datadog, etc.) via OTLP/HTTP.
Motivation
AgentV already captures rich structured traces (OutputMessage[] with ToolCall[], timing, token usage) and computes TraceSummary for every eval run. Users want to send these traces to observability platforms for:
- Debugging agent execution flows visually (span trees)
- Monitoring tool call patterns, latency, and costs across runs
- Comparing agent configurations in platform dashboards
- Integrating with existing LLMOps tooling (Langfuse, Braintrust, Datadog)
Industry evidence
- Braintrust uses OTel-compatible spans for unified offline/online tracing, and accepts OTLP ingestion — research
- Langfuse (v3.22+) accepts OTLP/HTTP at
/api/public/otel - Google ADK-js uses OpenTelemetry as its native tracing layer
- LangWatch converts traces to OTel format
- GenAI semantic conventions (
gen_ai.*) are standardized across the industry
Stale PRs superseded by this issue
| PR | What it did | Status | Why superseded |
|---|---|---|---|
| #136 | Standalone example: JSONL → OTLP/HTTP export script (Confident/Langfuse) | Draft, stale since Jan 9 | Good proof-of-concept but consumer-side only; this issue proposes built-in CLI support |
| #92 | Langfuse-specific SDK integration (openspec proposal + design doc) | Draft, stale since Jan 1 | Vendor-locked to Langfuse SDK; this issue proposes vendor-neutral OTel export |
Key insight from reviewing the stale PRs: PR #92 proposed using the Langfuse SDK directly (--langfuse flag), which creates vendor lock-in. PR #136 used standard OTel SDK (@opentelemetry/exporter-trace-otlp-http) and mapped to multiple backends — this is the right approach. This issue takes #136's OTel-native approach and makes it a built-in CLI feature.
Design
Principle: OTel-native, vendor-neutral
Use @opentelemetry/exporter-trace-otlp-http directly — no vendor-specific SDKs. Backend configuration is handled entirely via standard OTel environment variables and optional AgentV backend presets.
This aligns with AgentV's architecture principles:
- No vendor lock-in — any OTLP-compatible backend works
- CLI-first — single flag enables export
- Plugin-friendly — custom exporters can extend via code_judge pattern
OutputMessage → OTel Span Mapping
| AgentV Concept | OTel Span | Attributes |
|---|---|---|
| Eval run (per test case) | Root trace span | agentv.test_id, agentv.target, agentv.dataset, agentv.score |
| Assistant message | Child span (gen_ai.generation) |
gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens |
| ToolCall | Child span (gen_ai.tool) |
gen_ai.tool.name, gen_ai.tool.call.id |
| EvaluatorResult | Span event or attribute | agentv.evaluator.name, agentv.evaluator.score, agentv.evaluator.verdict |
Span hierarchy:
Trace: agentv.eval (test_id="case-001")
├── Span: gen_ai.generation (assistant response)
│ └── attributes: model, token usage, cost
├── Span: gen_ai.tool (tool="search")
│ └── attributes: tool name, duration
├── Span: gen_ai.tool (tool="read_file")
│ └── attributes: tool name, duration
└── Events: evaluator scores attached to root span
CLI Interface
# Export to any OTLP backend
agentv eval tests.yaml --export-otel
# Backend is configured via standard OTel env vars
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel \
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64>" \
agentv eval tests.yaml --export-otel
# Or use built-in backend presets for convenience
agentv eval tests.yaml --export-otel --otel-backend langfuse
agentv eval tests.yaml --export-otel --otel-backend braintrustBackend Presets (Convenience Only)
Presets auto-configure OTEL_EXPORTER_OTLP_ENDPOINT and auth headers from well-known env vars:
| Preset | Endpoint | Auth Env Vars |
|---|---|---|
langfuse |
https://cloud.langfuse.com/api/public/otel/v1/traces |
LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY → Basic Auth |
braintrust |
Via BRAINTRUST_API_KEY |
BRAINTRUST_API_KEY |
confident |
https://otel.confident-ai.com/v1/traces |
CONFIDENT_API_KEY → x-confident-api-key header |
| (none) | Uses OTEL_EXPORTER_OTLP_ENDPOINT directly |
Uses OTEL_EXPORTER_OTLP_HEADERS directly |
Privacy: Content Capture
Following PR #92's design (and Azure SDK / Google ADK patterns):
- Default: Do NOT export message content or tool inputs/outputs — only metadata, timing, scores
- Opt-in:
--otel-capture-contentflag orAGENTV_OTEL_CAPTURE_CONTENT=trueto include full content - Rationale: Traces may contain PII, secrets, or proprietary code
Error Handling
- Export failures log a warning and do NOT fail the evaluation
- Flush pending spans before CLI exits (with timeout)
- Missing credentials when
--export-otelis set → warning, proceed without export
Implementation Plan
Phase 1: Core OTel Exporter
- Add
@opentelemetry/api,@opentelemetry/exporter-trace-otlp-http,@opentelemetry/sdk-trace-node,@opentelemetry/resources,@opentelemetry/semantic-conventionsas optional dependencies - Create
packages/core/src/observability/otel-exporter.ts— convertsEvaluationResult+OutputMessage[]→ OTel spans - Create
packages/core/src/observability/types.ts—TraceExporterinterface - Wire into eval orchestrator: after each test case completes, if
--export-otelis set, export the result - Flush on eval completion
Phase 2: Backend Presets
- Add preset config for Langfuse, Braintrust, Confident AI
--otel-backend <name>maps to endpoint + auth header construction
Phase 3: Content Control + Polish
- Content filtering (strip message content / tool I/O when capture disabled)
- Documentation + example
Acceptance Criteria
-
--export-otelflag sends OTLP/HTTP traces to configured endpoint - Span hierarchy: root (eval case) → children (assistant messages, tool calls)
-
gen_ai.*semantic conventions for LLM and tool attributes -
TraceSummarymetrics as root span attributes - Backend presets for Langfuse, Braintrust, Confident AI
- Privacy-first: no content exported by default
-
--otel-capture-contentenables full content export - Export failures don't fail the evaluation
- Works with
--traceflag (full output messages) and without (uses TraceSummary) - Unit tests for span conversion
- Supersedes PRs feat: Langfuse trace export #92 and Add OTEL exporter example (Confident/Langfuse) #136 (close both when this ships)
Effort Estimate
3-5 days (reuses mapping patterns from PR #136)
Relation to Existing Work
- PR Add OTEL exporter example (Confident/Langfuse) #136 (draft): Proved the OTel approach works. This issue promotes it to built-in.
- PR feat: Langfuse trace export #92 (draft): Good design doc but Langfuse-specific. This issue generalizes to any OTel backend.
--traceflag (merged in feat(cli): add --trace flag for persisting execution traces (#172) #186): Already persistsOutputMessage[]to JSONL. OTel export reuses the same data.TraceSummary(merged in feat(core): add span-based timing to trace types (#172) #185): Compact trace metadata always available — used as span attributes even without--trace.