Split HTTP error metrics into separate 4xx and 5xx counters by rdimitrov · Pull Request #959 · modelcontextprotocol/registry

rdimitrov · 2026-02-15T00:40:30Z

Motivation and Context

The following PR split the metrics into separate 4xx and 5xx counters

Details:

Separate mcp_registry_http_errors_total to only count 5xx server errors
Add new mcp_registry_http_client_errors_total counter for 4xx client errors
Previously all 4xx and 5xx were counted together under errors making it harder to distinguish real server issues from client validation errors (e.g. 422s were 93% of "errors")

How Has This Been Tested?

Breaking Changes

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Signed-off-by: Radoslav Dimitrov <[email protected]>

rdimitrov · 2026-02-15T00:52:28Z

@claude - hey, can you review that

claude · 2026-02-15T00:52:43Z

Claude finished @rdimitrov's task in 54s —— View job

Code Review

Tasks

Read and analyze the changed files
Review the metrics implementation
Provide comprehensive feedback

Summary

This PR successfully splits HTTP error metrics into separate 4xx and 5xx counters, improving observability by distinguishing client errors from server errors. The implementation is clean and follows Prometheus best practices.

✅ What's Good

1. Clear separation of concerns

The new ClientErrorCount metric properly tracks 4xx errors
The existing ErrorCount metric now correctly tracks only 5xx errors
Metric names follow Prometheus naming conventions (http.errors for server errors, http.client_errors for client errors)

2. Correct implementation in middleware (router.go:80-84)

if statusCode >= 500 {
    metrics.ErrorCount.Add(ctx.Context(), 1, metric.WithAttributes(attrs...))
} else if statusCode >= 400 {
    metrics.ClientErrorCount.Add(ctx.Context(), 1, metric.WithAttributes(attrs...))
}

The logic correctly handles the full range of status codes:

5xx → ErrorCount
4xx → ClientErrorCount
2xx/3xx → no error metric (correct)

3. Updated descriptions

Comments and metric descriptions clearly document what each counter tracks
This helps with metric interpretation in monitoring dashboards

🔍 Observations

1. Metric naming convention
The new metric names are:

mcp_registry.http.errors (5xx only)
mcp_registry.http.client_errors (4xx only)

This is a breaking change for existing dashboards/alerts that rely on mcp_registry_http_errors_total (Prometheus exported name). The old metric counted both 4xx and 5xx together.

Impact:

✅ Good: More precise alerting (e.g., alert only on 5xx server errors)
⚠️ Breaking: Existing queries for mcp_registry_http_errors_total will now only show 5xx errors
📊 Migration needed: Dashboards/alerts need updating to use both metrics if they want total error count

2. Alternative naming consideration
The metric names could be more parallel:

Current: http.errors and http.client_errors
Alternative: http.server_errors and http.client_errors

This would make the naming more symmetric, though the current naming follows the common convention where "errors" implies server errors.

💡 Suggestions

1. Consider documenting the breaking change

Since this changes the semantics of an existing metric, it would be helpful to:

Add a note in the PR description about the breaking change
Consider if this warrants a changelog entry
Check if any existing dashboards/alerts need updating

2. Optional: Add a combined metric for backward compatibility

If you want to maintain backward compatibility, you could add a third metric that combines both:

// TotalErrorCount tracks all errors (4xx + 5xx)
TotalErrorCount metric.Int64Counter

However, given that this is an internal registry and the split provides better observability, the breaking change seems justified.

✨ Verdict

LGTM with minor documentation suggestion!

The implementation is correct, clean, and improves observability. The only consideration is documenting the breaking change for anyone monitoring the old http_errors_total metric. The separation of 4xx and 5xx errors will make it much easier to distinguish actual server problems from client validation errors.

Great work on improving the metrics! 🎉

| Branch: split-metrics

pree-dew · 2026-02-15T08:05:34Z

@rdimitrov In order to identify server issues, we can use status_code label in metric
mcp_registry_http_errors_total{status_code=~"5.*"}

Generally standard is to use status_code to distinguish different kind of issues instead of creating separate metric until we are facing high cardinality issues with single metric.

rdimitrov · 2026-02-15T08:43:58Z

@rdimitrov In order to identify server issues, we can use status_code label in metric mcp_registry_http_errors_total{status_code=~"5.*"}

Generally standard is to use status_code to distinguish different kind of issues instead of creating separate metric until we are facing high cardinality issues with single metric.

@pree-dew - Thanks for the feedback! 🙏 That does make the PR redundant so I'll close it 👍 Is there something on the dashboard side that we should do? Also as a side note what do you think about moving the dashboards to a more gitopsy way of managing them, i.e. configmaps stored in the repo?

pree-dew · 2026-02-15T11:26:49Z

@rdimitrov I will add panel for server side and client side error on the dashboard to help with this issue.
I think, if we have very limited access to dashboard via UI then we can continue with that as it doesn’t change so frequently and allows quick fixes to noise, allow easy testing and give quick feedback for changes, config for dashboard might make it bit hard as there is no visible way to check the configuration apart from local.

rdimitrov · 2026-02-15T11:37:55Z

@rdimitrov I will add panel for server side and client side error on the dashboard to help with this issue.

Thanks!

I think, if we have very limited access to dashboard via UI then we can continue with that as it doesn’t change so frequently and allows quick fixes to noise, allow easy testing and give quick feedback for changes, config for dashboard might make it bit hard as there is no visible way to check the configuration apart from local.

Sounds good! 🙏

Split the 4xx/5xx metrics

4384b23

Signed-off-by: Radoslav Dimitrov <[email protected]>

rdimitrov closed this Feb 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split HTTP error metrics into separate 4xx and 5xx counters#959

Split HTTP error metrics into separate 4xx and 5xx counters#959
rdimitrov wants to merge 1 commit intomainfrom
split-metrics

rdimitrov commented Feb 15, 2026

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

claude bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

pree-dew commented Feb 15, 2026

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

pree-dew commented Feb 15, 2026

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rdimitrov commented Feb 15, 2026

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

claude bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Tasks

Summary

✅ What's Good

🔍 Observations

💡 Suggestions

✨ Verdict

Uh oh!

pree-dew commented Feb 15, 2026

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

pree-dew commented Feb 15, 2026

Uh oh!

rdimitrov commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Feb 15, 2026 •

edited

Loading