Skip to content

feat: OpenTelemetry trace export (OTLP/HTTP) #277

@christso

Description

@christso

Summary

Add a built-in --export-otel CLI flag that exports AgentV evaluation traces to any OpenTelemetry-compatible backend (Langfuse, Braintrust, Confident AI, Jaeger, Grafana Tempo, Datadog, etc.) via OTLP/HTTP.

Motivation

AgentV already captures rich structured traces (OutputMessage[] with ToolCall[], timing, token usage) and computes TraceSummary for every eval run. Users want to send these traces to observability platforms for:

  • Debugging agent execution flows visually (span trees)
  • Monitoring tool call patterns, latency, and costs across runs
  • Comparing agent configurations in platform dashboards
  • Integrating with existing LLMOps tooling (Langfuse, Braintrust, Datadog)

Industry evidence

  • Braintrust uses OTel-compatible spans for unified offline/online tracing, and accepts OTLP ingestion — research
  • Langfuse (v3.22+) accepts OTLP/HTTP at /api/public/otel
  • Google ADK-js uses OpenTelemetry as its native tracing layer
  • LangWatch converts traces to OTel format
  • GenAI semantic conventions (gen_ai.*) are standardized across the industry

Stale PRs superseded by this issue

PR What it did Status Why superseded
#136 Standalone example: JSONL → OTLP/HTTP export script (Confident/Langfuse) Draft, stale since Jan 9 Good proof-of-concept but consumer-side only; this issue proposes built-in CLI support
#92 Langfuse-specific SDK integration (openspec proposal + design doc) Draft, stale since Jan 1 Vendor-locked to Langfuse SDK; this issue proposes vendor-neutral OTel export

Key insight from reviewing the stale PRs: PR #92 proposed using the Langfuse SDK directly (--langfuse flag), which creates vendor lock-in. PR #136 used standard OTel SDK (@opentelemetry/exporter-trace-otlp-http) and mapped to multiple backends — this is the right approach. This issue takes #136's OTel-native approach and makes it a built-in CLI feature.

Design

Principle: OTel-native, vendor-neutral

Use @opentelemetry/exporter-trace-otlp-http directly — no vendor-specific SDKs. Backend configuration is handled entirely via standard OTel environment variables and optional AgentV backend presets.

This aligns with AgentV's architecture principles:

  • No vendor lock-in — any OTLP-compatible backend works
  • CLI-first — single flag enables export
  • Plugin-friendly — custom exporters can extend via code_judge pattern

OutputMessage → OTel Span Mapping

AgentV Concept OTel Span Attributes
Eval run (per test case) Root trace span agentv.test_id, agentv.target, agentv.dataset, agentv.score
Assistant message Child span (gen_ai.generation) gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens
ToolCall Child span (gen_ai.tool) gen_ai.tool.name, gen_ai.tool.call.id
EvaluatorResult Span event or attribute agentv.evaluator.name, agentv.evaluator.score, agentv.evaluator.verdict

Span hierarchy:

Trace: agentv.eval (test_id="case-001")
├── Span: gen_ai.generation (assistant response)
│   └── attributes: model, token usage, cost
├── Span: gen_ai.tool (tool="search")
│   └── attributes: tool name, duration
├── Span: gen_ai.tool (tool="read_file")
│   └── attributes: tool name, duration
└── Events: evaluator scores attached to root span

CLI Interface

# Export to any OTLP backend
agentv eval tests.yaml --export-otel

# Backend is configured via standard OTel env vars
OTEL_EXPORTER_OTLP_ENDPOINT=https://cloud.langfuse.com/api/public/otel \
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <base64>" \
agentv eval tests.yaml --export-otel

# Or use built-in backend presets for convenience
agentv eval tests.yaml --export-otel --otel-backend langfuse
agentv eval tests.yaml --export-otel --otel-backend braintrust

Backend Presets (Convenience Only)

Presets auto-configure OTEL_EXPORTER_OTLP_ENDPOINT and auth headers from well-known env vars:

Preset Endpoint Auth Env Vars
langfuse https://cloud.langfuse.com/api/public/otel/v1/traces LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY → Basic Auth
braintrust Via BRAINTRUST_API_KEY BRAINTRUST_API_KEY
confident https://otel.confident-ai.com/v1/traces CONFIDENT_API_KEYx-confident-api-key header
(none) Uses OTEL_EXPORTER_OTLP_ENDPOINT directly Uses OTEL_EXPORTER_OTLP_HEADERS directly

Privacy: Content Capture

Following PR #92's design (and Azure SDK / Google ADK patterns):

  • Default: Do NOT export message content or tool inputs/outputs — only metadata, timing, scores
  • Opt-in: --otel-capture-content flag or AGENTV_OTEL_CAPTURE_CONTENT=true to include full content
  • Rationale: Traces may contain PII, secrets, or proprietary code

Error Handling

  • Export failures log a warning and do NOT fail the evaluation
  • Flush pending spans before CLI exits (with timeout)
  • Missing credentials when --export-otel is set → warning, proceed without export

Implementation Plan

Phase 1: Core OTel Exporter

  1. Add @opentelemetry/api, @opentelemetry/exporter-trace-otlp-http, @opentelemetry/sdk-trace-node, @opentelemetry/resources, @opentelemetry/semantic-conventions as optional dependencies
  2. Create packages/core/src/observability/otel-exporter.ts — converts EvaluationResult + OutputMessage[] → OTel spans
  3. Create packages/core/src/observability/types.tsTraceExporter interface
  4. Wire into eval orchestrator: after each test case completes, if --export-otel is set, export the result
  5. Flush on eval completion

Phase 2: Backend Presets

  1. Add preset config for Langfuse, Braintrust, Confident AI
  2. --otel-backend <name> maps to endpoint + auth header construction

Phase 3: Content Control + Polish

  1. Content filtering (strip message content / tool I/O when capture disabled)
  2. Documentation + example

Acceptance Criteria

  • --export-otel flag sends OTLP/HTTP traces to configured endpoint
  • Span hierarchy: root (eval case) → children (assistant messages, tool calls)
  • gen_ai.* semantic conventions for LLM and tool attributes
  • TraceSummary metrics as root span attributes
  • Backend presets for Langfuse, Braintrust, Confident AI
  • Privacy-first: no content exported by default
  • --otel-capture-content enables full content export
  • Export failures don't fail the evaluation
  • Works with --trace flag (full output messages) and without (uses TraceSummary)
  • Unit tests for span conversion
  • Supersedes PRs feat: Langfuse trace export #92 and Add OTEL exporter example (Confident/Langfuse) #136 (close both when this ships)

Effort Estimate

3-5 days (reuses mapping patterns from PR #136)

Relation to Existing Work

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions