chore: give hint of configuration on otel invalid utf8 errors by SkArchon · Pull Request #2551 · wundergraph/cosmo

SkArchon · 2026-02-24T14:32:42Z

Right now we have options to sanitize span attributes, however this is off by default. Users right now will get the error message of "string field contains invalid UTF-8" when non utf8 data is set into a span anywhere. However users might not know that a workaround exists (right now for span attributes), without logging the error as is, if the error is related to non utf8 data we will log the error with a hint.

Summary by CodeRabbit

Bug Fixes
- Improved telemetry error handling and logging for export failures involving invalid UTF-8, including a clear hint to enable UTF-8 sanitization.
Tests
- Added test coverage simulating invalid UTF-8 export errors to verify sanitization hints are logged.

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.
Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
I have read the Contributors Guide.

coderabbitai · 2026-02-24T14:34:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb1a06e and e79e68e.

📒 Files selected for processing (1)

router-tests/telemetry/attribute_processor_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

router-tests/telemetry/attribute_processor_test.go

Walkthrough

Adds an OTLP tracing error helper to detect invalid UTF-8 errors, integrates it into the meter error handler, and adds a test exercising logging of a sanitization hint when an exporter returns an invalid-UTF8 error.

Changes

Cohort / File(s)	Summary
Error Handling Utilities `router/pkg/trace/errors.go`	New file adding an `invalidUTF8Error` interface, `hasInvalidUTF8Error(err error) bool` to inspect error chains, and `errHandler(config *ProviderConfig) func(error)` that logs a specialized message when invalid UTF-8 is detected and a generic OTLP error otherwise.
Integration `router/pkg/trace/meter.go`	Replaces the previous inline otel error handler with `errHandler(config)` to centralize error handling logic.
Test Coverage `router-tests/telemetry/attribute_processor_test.go`	Adds a test case that uses a custom `invalidUTF8Exporter` and `errInvalidUTF8` to simulate an export error containing invalid UTF-8 and asserts that the sanitization hint log message is produced. Also adds small imports and helper methods for the test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding configuration hints when invalid UTF-8 errors occur in OTEL tracing. It directly relates to the core objective of improving user experience when encountering UTF-8-related span attribute errors.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-24T14:37:14Z

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-e10328e6a8fe1fa145640f635a65ae963dcb0721-nonroot

codecov · 2026-02-24T14:41:36Z

Codecov Report

❌ Patch coverage is 14.28571% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 25.61%. Comparing base (153432d) to head (871a17d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
router/pkg/trace/errors.go	7.69%	11 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (14.28%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2551       +/-   ##
===========================================
- Coverage   62.77%   25.61%   -37.17%     
===========================================
  Files         242      243        +1     
  Lines       25560    25571       +11     
===========================================
- Hits        16046     6549     -9497     
- Misses       8178    18405    +10227     
+ Partials     1336      617      -719

Files with missing lines	Coverage Δ
router/pkg/trace/meter.go	`26.89% <100.00%> (-14.44%)`	⬇️
router/pkg/trace/errors.go	`7.69% <7.69%> (ø)`

... and 154 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

router/pkg/trace/meter.go (1)
218-222: ⚠️ Potential issue | 🟡 Minor

otel.SetErrorHandler should respect the same test-mode guard as otel.SetTracerProvider.

The comment at Line 217 says "Don't set it globally when we use the router in tests," and SetTracerProvider is correctly guarded by if config.MemoryExporter == nil. But SetErrorHandler on Line 222 is unconditional and always mutates global state, even in test mode.

In parallel test runs every call to NewTracerProvider races to replace the one global handler; the last writer wins. The new test at attribute_processor_test.go:191 specifically relies on errors from its invalidUTF8Exporter reaching its own observer — if another parallel test overwrites the handler between setup and span export the assertion can fail non-deterministically.
🛡️ Proposed fix: apply the same guard
-	otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	if config.MemoryExporter == nil {
+		otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/meter.go` around lines 218 - 222, The global error handler
is being set unconditionally causing test-time races; guard the
otel.SetErrorHandler call the same way otel.SetTracerProvider is guarded: only
call otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.

🧹 Nitpick comments (2)

router/pkg/trace/errors.go (1)
8-22: invalidUTF8Error interface approach is sound — worth documenting the coupling.

The interface{ InvalidUTF8() bool } check matches the detection pattern the protobuf team themselves recommend (per the public issue tracker). However, since errInvalidUTF8 lives in google.golang.org/protobuf/internal/impl, there's no stability guarantee — the type or method could be removed in a future protobuf release, causing hasInvalidUTF8Error to silently return false without any compile-time signal. The degradation is graceful (errors still log as plain "otel error"), but a brief comment noting this dependency would help future maintainers.

errors.As with a *invalidUTF8Error (pointer-to-interface) target is valid; the Go stdlib explicitly supports passing a pointer to any interface type.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/errors.go` around lines 8 - 22, Add a short comment above
the invalidUTF8Error type and hasInvalidUTF8Error function that documents the
coupling to google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state
that this check relies on the presence of an internal type exposing
InvalidUTF8(), that the internal package is not guaranteed stable and could be
removed or changed in future protobuf releases, and that in such cases
hasInvalidUTF8Error will gracefully return false; reference the symbols
invalidUTF8Error and hasInvalidUTF8Error and mention that errors.As is
intentionally used with an interface target to detect that condition.
router-tests/telemetry/attribute_processor_test.go (1)
206-212: Redundant double log filtering.

FilterMessageSnippet("sanitize_utf8").All() is called once inside require.Eventually (where the result is discarded) and then again on Lines 211–212. After require.Eventually exits successfully the log entries are guaranteed to exist; the second call is unnecessary.
♻️ Proposed refactor
-		require.Eventually(t, func() bool {
-			logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-			return len(logs) > 0
-		}, 5*time.Second, 100*time.Millisecond)
-
-		logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-		require.NotEmpty(t, logs)
+		var logs []observer.LoggedEntry
+		require.Eventually(t, func() bool {
+			logs = xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
+			return len(logs) > 0
+		}, 5*time.Second, 100*time.Millisecond)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router-tests/telemetry/attribute_processor_test.go` around lines 206 - 212,
The test is calling xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
twice (once inside require.Eventually and again after), which is redundant;
declare a local variable logs before the require.Eventually call, assign to logs
inside the closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Line 213: The test calls require.Equal with the actual value first and the
expected string second, which reverses Testify's expected ordering and produces
confusing failure messages; change the call to require.Equal(t, "otel error:
traces export: string field contains invalid UTF-8: Enable
'telemetry.tracing.sanitize_utf8.enabled' in your config to sanitize invalid
UTF-8 attributes.", logs[0].Message) so the literal expectation is the
second-to-last argument and logs[0].Message is the actual (last) argument;
update the invocation that references require.Equal and logs[0].Message
accordingly.

---

Outside diff comments:
In `@router/pkg/trace/meter.go`:
- Around line 218-222: The global error handler is being set unconditionally
causing test-time races; guard the otel.SetErrorHandler call the same way
otel.SetTracerProvider is guarded: only call
otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.

---

Nitpick comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Around line 206-212: The test is calling
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All() twice (once inside
require.Eventually and again after), which is redundant; declare a local
variable logs before the require.Eventually call, assign to logs inside the
closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.

In `@router/pkg/trace/errors.go`:
- Around line 8-22: Add a short comment above the invalidUTF8Error type and
hasInvalidUTF8Error function that documents the coupling to
google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state that this
check relies on the presence of an internal type exposing InvalidUTF8(), that
the internal package is not guaranteed stable and could be removed or changed in
future protobuf releases, and that in such cases hasInvalidUTF8Error will
gracefully return false; reference the symbols invalidUTF8Error and
hasInvalidUTF8Error and mention that errors.As is intentionally used with an
interface target to detect that condition.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 153432d and eb1a06e.

📒 Files selected for processing (3)

router-tests/telemetry/attribute_processor_test.go
router/pkg/trace/errors.go
router/pkg/trace/meter.go

router-tests/telemetry/attribute_processor_test.go

router/pkg/trace/errors.go

endigma · 2026-02-24T19:56:41Z

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

StarpTech · 2026-02-24T21:54:00Z

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

@SkArchon is validating the overhead. If negligible, I'd also suggest enabling it by default.

SkArchon · 2026-02-25T10:36:37Z

It takes about 14.70089313 microseconds per request (with no custom metric configuration, more string attributes would mean this increases), lets assume the average is about 140ns per attribute (this is only applied to string attributes). While fast, since this is on the hot path and as only one customer has reported it, I would prefer having the log message, that way when folks do see the log message they are almost likely to go and enable this.

StarpTech

LGTM

fix: log configuration

c41d92c

SkArchon requested review from Noroth, StarpTech, devsergiy, endigma and jensneuse as code owners February 24, 2026 14:32

github-actions bot added the router label Feb 24, 2026

fix: reword hints

eb1a06e

SkArchon changed the title ~~chore: give hint of configuration on utf8 errors~~ chore: give hint of configuration on otel invalid utf8 errors Feb 24, 2026

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

router-tests/telemetry/attribute_processor_test.go Outdated Show resolved Hide resolved

StarpTech requested changes Feb 24, 2026

View reviewed changes

router/pkg/trace/errors.go Show resolved Hide resolved

Merge branch 'main' into milinda/log-otel-errors-with-addition

871a17d

SkArchon and others added 2 commits February 25, 2026 16:07

fix: review comments

17654dc

Merge branch 'main' into milinda/log-otel-errors-with-addition

e79e68e

StarpTech approved these changes Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: give hint of configuration on otel invalid utf8 errors#2551

chore: give hint of configuration on otel invalid utf8 errors#2551
SkArchon wants to merge 5 commits intomainfrom
milinda/log-otel-errors-with-addition

SkArchon commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

endigma commented Feb 24, 2026 •

edited

Loading

Uh oh!

StarpTech commented Feb 24, 2026

Uh oh!

SkArchon commented Feb 25, 2026 •

edited

Loading

Uh oh!

StarpTech left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SkArchon commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Checklist

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Router-nonroot image scan passed

Uh oh!

codecov bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

endigma commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StarpTech commented Feb 24, 2026

Uh oh!

SkArchon commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StarpTech left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SkArchon commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

github-actions bot commented Feb 24, 2026 •

edited

Loading

codecov bot commented Feb 24, 2026 •

edited

Loading

endigma commented Feb 24, 2026 •

edited

Loading

SkArchon commented Feb 25, 2026 •

edited

Loading