Skip to content

Comments

otel: add new module#61907

Open
bengl wants to merge 1 commit intonodejs:mainfrom
bengl:bengl/otel
Open

otel: add new module#61907
bengl wants to merge 1 commit intonodejs:mainfrom
bengl:bengl/otel

Conversation

@bengl
Copy link
Member

@bengl bengl commented Feb 20, 2026

Adds support for OpenTelemetry tracing, as described in the included docs. Custom instrumentation is not yet supported.

This PR is intended to lay the groundwork for future PRs where API support for custom instrumentations is added. The OpenTelemetry SDK is explicitly not used, mainly to avoid bloat.

A programmatic API to enable and disable tracing is added as a new module, node:otel, but I'm very open to suggestions in terms of where to add it instead, keeping in mind that the intent is to eventually expose an API for custom instrumentation alongside it.

Disclaimer: As an experiment, Claude Code was used to build this PR, based on a hand-written prototype I built (but did not publish) about a year ago. Everything was thoroughly reviewed by me prior to PR submission, including some hand-edits.

Adds support for OpenTelemetry tracing, as described in the included
docs.

Custom instrumentation is not yet supported.
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/config
  • @nodejs/loaders
  • @nodejs/startup

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Feb 20, 2026
Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, it would create confusion that the built-in support does not work with the @opentelemetry/api package, which is intended to be the global provider for otel components, like propagators and exporters.

@mcollina
Copy link
Member

I personally find the opentelemetry apis really cumbersome to work with. Maybe there is something better that can be done.

But I also concur that some kind of high level compatibility api would need to exist.

@ljharb
Copy link
Member

ljharb commented Feb 20, 2026

It would be really nice to get ahold of https://www.npmjs.com/package/otel and add this new core module without the prefix.

otel.start({ endpoint: 'http://collector.example.com:4318' });

// ... application code ...
otel.stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More a style nit... feel free to ignore. I find this start()/stop() pattern a bit cumbersome. I'd be happier something like...

const stop = otel.start();

stop();

which also allows for...

{
  using session = otel.start();
  // automatically stops
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While using is great, I don't think it's actually appropriate here. This is for enable and disabling an entire subsystem. stop() would probably actually be pretty rare, probably only used for some kind of error recovery / process-teardown. It definitely doesn't fit the usual use case for using, where we'd expect a resource to be disposed at the end of a variable scope.

* `options` {Object}
* `endpoint` {string} The OTLP collector endpoint URL.
**Default:** `'http://localhost:4318'`.
* `filter` {string|string\[]} Optional comma-separated string or array of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we pick one? I'd prefer just string[].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the way it is for consistency with the environment variable. It could be an array here in the API and then comma-separated in the environment variable.

@legendecas
Copy link
Member

legendecas commented Feb 20, 2026

This PR is intended to lay the groundwork for future PRs where API support for custom instrumentations is added. The OpenTelemetry SDK is explicitly not used, mainly to avoid bloat.
keeping in mind that the intent is to eventually expose an API for custom instrumentation alongside it.

Agree that an OpenTelemetry SDK is not built-in to the Node.js instrumentation. However, this implementation is not fully compliant with the OpenTelemetry design. The instrumentation should not implement the OpenTelemetry SDK components, like exporters and tracer SDK.

The three OpenTelemetry layers, API, SDK, and instrumentations, have their clear scopes and allows open customization without any vendor lock-in. That is, instrumentations will use the API to create spans and metrics, but do not directly depend on the SDK. This allows instrumentations to be vendor-agnostic. And observability vendors could provide their SDK (or just an exporter) to do whatever propriatary works, like exporting.

I think a Node.js built-in otel module should provide OpenTelemetry specification compliant API and implementations. A built-in OpenTelemetry should not split the OpenTelemetry community.

Copy link
Member

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small questions/ideas, but otherwise LGTM!

### `OTEL_SERVICE_NAME`

Standard OpenTelemetry environment variable used to set the service name in
exported resource attributes. Defaults to `node-<pid>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have to load the package.json during startup, don't we? So perhaps we could make this use the name field from it, if present, and fallback to that?

Comment on lines +39 to +54
let timeOriginNs;
function getTimeOriginNs() {
if (timeOriginNs === undefined) {
// getTimeOriginTimestamp() returns ms since Unix epoch.
// Multiply by 1e6 to get nanoseconds.
const originMs = getTimeOriginTimestamp();
timeOriginNs = BigInt(MathRound(originMs * 1e6));
}
return timeOriginNs;
}

function hrTimeToNanos() {
const relativeMs = now();
const ns = getTimeOriginNs() + BigInt(MathRound(relativeMs * 1e6));
return `${ns}`;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering a bit about the perf of this. Did you try a high/low pair like process.hrtime() returns to be able to do the 64-bit timestamp rather than this string approach? I'm a bit worried about the perf implications of lots of bigint timestamps being generated all the time. 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTEL hrtime, as far as I have been able to determine, is not compatible with process.hrtime. It is based upon performance.timeOrigin -- https://github.com/newrelic/node-newrelic/blob/39d1d9a59ca467d71d4a7db4cf437d743a24939b/lib/otel/normalize-timestamp.js#L56-L85

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from performance concerns, is it a good idea to depend entirely on a nonmonotonic clock? This may result in negative duration or out of order spans.

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 90.18036% with 98 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.75%. Comparing base (9a237cd) to head (ae2b9e3).
⚠️ Report is 46 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/otel/instrumentations.js 78.75% 50 Missing and 1 partial ⚠️
lib/internal/otel/flush.js 88.48% 34 Missing and 1 partial ⚠️
lib/internal/otel/span.js 93.71% 10 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61907      +/-   ##
==========================================
+ Coverage   89.73%   89.75%   +0.01%     
==========================================
  Files         675      680       +5     
  Lines      204674   205883    +1209     
  Branches    39330    39565     +235     
==========================================
+ Hits       183667   184781    +1114     
- Misses      13283    13387     +104     
+ Partials     7724     7715       -9     
Files with missing lines Coverage Δ
lib/internal/bootstrap/realm.js 96.21% <100.00%> (+0.21%) ⬆️
lib/internal/otel/core.js 100.00% <100.00%> (ø)
lib/internal/otel/id.js 100.00% <100.00%> (ø)
lib/internal/process/pre_execution.js 97.63% <100.00%> (+1.66%) ⬆️
lib/otel.js 100.00% <100.00%> (ø)
src/node_options.cc 76.30% <100.00%> (+0.02%) ⬆️
src/node_options.h 97.90% <100.00%> (+0.01%) ⬆️
lib/internal/otel/span.js 93.71% <93.71%> (ø)
lib/internal/otel/flush.js 88.48% <88.48%> (ø)
lib/internal/otel/instrumentations.js 78.75% <78.75%> (ø)

... and 56 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

urlStr + '/v1/traces';
}

const parsed = new URL(urlStr);
Copy link

@SerenModz21 SerenModz21 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition could be simplified to the following:

const parsed = new URL('/v1/traces', endpoint);

Tests with the Node REPL:

> new URL("/v1/traces", "http://localhost:4318/hello?foo=bar").href
'http://localhost:4318/v1/traces'
> new URL("/v1/traces", "http://localhost:4318/").href
'http://localhost:4318/v1/traces'
> new URL("/v1/traces", "http://localhost:4318").href
'http://localhost:4318/v1/traces'
> new URL("/v1/traces", "http://localhost:4318/v1").href
'http://localhost:4318/v1/traces'
> new URL("/v1/traces", "http://localhost:4318/v1/traces").href
'http://localhost:4318/v1/traces'
> new URL("/v1/traces", "http://localhost:4318/v1/traces/").href
'http://localhost:4318/v1/traces'

@dyladan
Copy link

dyladan commented Feb 21, 2026

I have a lot of thoughts about this so I'll try to keep it succinct and direct. In the interest of full disclosure, I have been one of the maintainers of OTel JS for years and that will likely color some of my opinions. I am not on the OTel Governance Committee and the opinions here should be taken as my own and not as the official stance of the project.

I'll start with the things I like. First, I think having built-in support for OTel is great, and focusing on HTTP first makes sense. The lack of interoperability with existing instrumentation means the use of this will be very limited for now, but that's probably fine. Anyone who needs other instrumentation has the option to use the existing otel libraries. Second, i think it is reasonable to only support OTLP. Supporting other export formats is a well intentioned part of OTel, but it was a lot more important in the earlier days when OTLP support was limited. Now all major vendors support OTLP, and the collector serves as a mature routing layer for people who need to use other formats. I would prefer to have OTLP protobuf as an option since it is by far the most used OTLP format, but JSON makes sense as the de facto standard format for JS remote calls and I understand the desire to avoid the protobuf dependency. Third, I do see the value in a reimplementation which is node-specific and avoids the need for various platform-specific workarounds. Because this is shipped with node itself, there is no need to retain workarounds for old node versions or non-node platforms like browsers. Finally, this seems really easy to use. The lack of configuration, customization, and manual instrumentation definitely makes this simpler to understand and use for end users, and because it is built in there is no need to make sure your otel dependency versions align with each other.

I do have a couple issues with the way this is implemented. This seems to have completely done away with most of the features required by the OTel specification. For example, it is missing all of the normal extension points (configuration, propagators, samplers, id generators, span processors, exporters, resource, entities). I'm not sure if this was an unintentional side-effect of vibe coding a replacement SDK or an intentional choice to "avoid bloat," but it really undercuts a lot of the value of OTel. I might even go so far as to say it's not quite accurate to call this "OpenTelemetry support" without those things. It might be better to call this "OTLP span export" or something like that. At the very least I'd make sure you reach out to the OTel Governance Committee to obtain permission to use the OpenTelemetry trademark. I'm not a lawyer and not on the GC, but this is an area where I think it pays to be careful.

Finally, and this is my biggest concern, I agree with @legendecas that this seems like it would fracture the otel ecosystem on node. I am most concerned about the complete lack of interoperability with existing otel libraries. It seems to me that there is definitely a way that this could be done that would retain interoperability and allow a gradual transition to a built in implementation without such a fracture. I understand that obviously node can't take a dependency on an NPM module, but I'm certain the otel JS SDK would not have any issue taking a dependency on built-in modules where the experience could be improved for users.

@bengl
Copy link
Member Author

bengl commented Feb 23, 2026

@legendecas @dyladan

The PR description probably could have used a bit more details. Sorry about that.

First, let me address compatibility/fracturing and support (or lack thereof) for @opentelemetry/api. This PR is intended as an initial phase, supporting OTLP export of internally-created spans for HTTP inbound and outbound only, for now. API support is intended to be handled in a subsequent PR, including support for Span creation/manipulation via @opentelemetry/api, similar to how it's done in Deno. Requiring the usage of a completely different API would not be in scope. Bottom line: fracturing is an anti-goal of this PR, and my intended subsequent PR to add API support, so I'm happy to make whatever changes here that will prevent fracturing and still enable good built-in support.

Second, I agree that calling this PR "OpenTelemetry support" could be misleading if users expect full API support. I'm happy to rename things as appropriate, or move the programmatic activation to some other module (though, I kept it as a new module here knowing that I'd eventually want to export a TracerProvider implementation, etc.). The implementation here is intended to be minimal to avoid bloat and improve performance, particularly for the minimal case it provides for. It can serve as a core for future support for the extension points mentioned via API support, but, of course, that isn't there yet.

The tracing subsystem automatically propagates [W3C Trace Context][] across HTTP
boundaries:

* **Incoming requests**: The `traceparent` header is read from incoming HTTP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should either include tracestate handling otherwise it's not complete from W3C spec point of view.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With tracestate unused by this internally it may be as simple as propagating it with the span so it can be attached on client spans. No need to parse it for now.


This module is only available under the `node:` scheme. It requires the
`--experimental-otel` CLI flag or one of the `NODE_OTEL` / `NODE_OTEL_ENDPOINT`
environment variables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should clearly mention the limitations of the current state like

  • incompatible with OTel components from the OpenTelemetry project
  • trace.id/span.id duplication
  • likely overwrite of W3C tracecontext headers in outgoing requests

I guess a user has to decide to use either the one or the other OTel variant and avoid to use both in the same node.js process.


> Stability: 1 - Experimental

When set to a non-empty value, enables the built-in OpenTelemetry tracing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really any value? e.g. having NODE_OTEL=disable or NODE_OTEL=0 to enable it seems a bit uncommon.

Quite some other node env vars check on the value 1.

Uint8Array,
} = primordials;

const { randomFillSync } = require('internal/crypto/random');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to use cryptographically strong pseudorandom here? Or is the perf impact don't care?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required to be crypto random by spec

}
}

function flush() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't flush return a promise to allow users to wait until all data is sent? e.g. before shutdown.

}

if (isModuleEnabled('node:http')) {
sub('http.server.request.start', onHttpServerRequestStart);
Copy link
Member

@IlyasShabi IlyasShabi Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to check is it's active and module enabled before subscribing


// Generate W3C traceparent header value.
// Format: {version}-{trace-id}-{span-id}-{trace-flags}
inject() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to getTraceParent(). This function doesn't inject anything as of now.

if (typeof value === 'boolean') {
return { boolValue: value };
}
return { stringValue: String(value) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there intentionally no array support?

lastExportWarningTime = now;
const suffix = exportFailureCount > 1 ?
` (${exportFailureCount} total failures)` : '';
process.emitWarning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe otel events should be emitted on the otel object instead process.


const transport = parsed.protocol === 'https:' ? https : http;

const req = transport.request({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider to use an http agent to keep the connection alive for some time.

}
}

function sendToCollector(body) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no throttling, so if a lot spans are ended and/or connection is slow quite a lot parallel requests might be created.

Comment on lines +55 to +58
request[kSpan] = span;

const storage = getSpanStorage();
storage.enterWith(span);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do request[kSpan] if you're also using AsyncLocalStorage? Is this just a performance optimization to avoid calling ALS again on response finish?

@jsumners-nr
Copy link

The implementation here is intended to be minimal to avoid bloat and improve performance, particularly for the minimal case it provides for.

A very welcome goal. We are extremely interested in this moving forward. We haven't even technically released/announced our support for bridging OTEL signals into our agent, and we have already had complaints about the insane bloat the OTEL dependencies add as disk size, and complaints about the memory impacts of just loading them. But we really don't have the time to go off and re-implement the OTEL API+SDK just to elide all of the things the reference implementations bring in.

If we can set up a GitHub project of issues to resolve, that would make it far easier for those interested to contribute and get this feature fully implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.