Skip to content

chore: give hint of configuration on otel invalid utf8 errors#2551

Open
SkArchon wants to merge 5 commits intomainfrom
milinda/log-otel-errors-with-addition
Open

chore: give hint of configuration on otel invalid utf8 errors#2551
SkArchon wants to merge 5 commits intomainfrom
milinda/log-otel-errors-with-addition

Conversation

@SkArchon
Copy link
Contributor

@SkArchon SkArchon commented Feb 24, 2026

Right now we have options to sanitize span attributes, however this is off by default. Users right now will get the error message of "string field contains invalid UTF-8" when non utf8 data is set into a span anywhere. However users might not know that a workaround exists (right now for span attributes), without logging the error as is, if the error is related to non utf8 data we will log the error with a hint.

Summary by CodeRabbit

  • Bug Fixes

    • Improved telemetry error handling and logging for export failures involving invalid UTF-8, including a clear hint to enable UTF-8 sanitization.
  • Tests

    • Added test coverage simulating invalid UTF-8 export errors to verify sanitization hints are logged.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb1a06e and e79e68e.

📒 Files selected for processing (1)
  • router-tests/telemetry/attribute_processor_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • router-tests/telemetry/attribute_processor_test.go

Walkthrough

Adds an OTLP tracing error helper to detect invalid UTF-8 errors, integrates it into the meter error handler, and adds a test exercising logging of a sanitization hint when an exporter returns an invalid-UTF8 error.

Changes

Cohort / File(s) Summary
Error Handling Utilities
router/pkg/trace/errors.go
New file adding an invalidUTF8Error interface, hasInvalidUTF8Error(err error) bool to inspect error chains, and errHandler(config *ProviderConfig) func(error) that logs a specialized message when invalid UTF-8 is detected and a generic OTLP error otherwise.
Integration
router/pkg/trace/meter.go
Replaces the previous inline otel error handler with errHandler(config) to centralize error handling logic.
Test Coverage
router-tests/telemetry/attribute_processor_test.go
Adds a test case that uses a custom invalidUTF8Exporter and errInvalidUTF8 to simulate an export error containing invalid UTF-8 and asserts that the sanitization hint log message is produced. Also adds small imports and helper methods for the test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding configuration hints when invalid UTF-8 errors occur in OTEL tracing. It directly relates to the core objective of improving user experience when encountering UTF-8-related span attribute errors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Feb 24, 2026

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-e10328e6a8fe1fa145640f635a65ae963dcb0721-nonroot

@SkArchon SkArchon changed the title chore: give hint of configuration on utf8 errors chore: give hint of configuration on otel invalid utf8 errors Feb 24, 2026
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 14.28571% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 25.61%. Comparing base (153432d) to head (871a17d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
router/pkg/trace/errors.go 7.69% 11 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (14.28%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2551       +/-   ##
===========================================
- Coverage   62.77%   25.61%   -37.17%     
===========================================
  Files         242      243        +1     
  Lines       25560    25571       +11     
===========================================
- Hits        16046     6549     -9497     
- Misses       8178    18405    +10227     
+ Partials     1336      617      -719     
Files with missing lines Coverage Δ
router/pkg/trace/meter.go 26.89% <100.00%> (-14.44%) ⬇️
router/pkg/trace/errors.go 7.69% <7.69%> (ø)

... and 154 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/pkg/trace/meter.go (1)

218-222: ⚠️ Potential issue | 🟡 Minor

otel.SetErrorHandler should respect the same test-mode guard as otel.SetTracerProvider.

The comment at Line 217 says "Don't set it globally when we use the router in tests," and SetTracerProvider is correctly guarded by if config.MemoryExporter == nil. But SetErrorHandler on Line 222 is unconditional and always mutates global state, even in test mode.

In parallel test runs every call to NewTracerProvider races to replace the one global handler; the last writer wins. The new test at attribute_processor_test.go:191 specifically relies on errors from its invalidUTF8Exporter reaching its own observer — if another parallel test overwrites the handler between setup and span export the assertion can fail non-deterministically.

🛡️ Proposed fix: apply the same guard
-	otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	if config.MemoryExporter == nil {
+		otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/meter.go` around lines 218 - 222, The global error handler
is being set unconditionally causing test-time races; guard the
otel.SetErrorHandler call the same way otel.SetTracerProvider is guarded: only
call otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.
🧹 Nitpick comments (2)
router/pkg/trace/errors.go (1)

8-22: invalidUTF8Error interface approach is sound — worth documenting the coupling.

The interface{ InvalidUTF8() bool } check matches the detection pattern the protobuf team themselves recommend (per the public issue tracker). However, since errInvalidUTF8 lives in google.golang.org/protobuf/internal/impl, there's no stability guarantee — the type or method could be removed in a future protobuf release, causing hasInvalidUTF8Error to silently return false without any compile-time signal. The degradation is graceful (errors still log as plain "otel error"), but a brief comment noting this dependency would help future maintainers.

errors.As with a *invalidUTF8Error (pointer-to-interface) target is valid; the Go stdlib explicitly supports passing a pointer to any interface type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/errors.go` around lines 8 - 22, Add a short comment above
the invalidUTF8Error type and hasInvalidUTF8Error function that documents the
coupling to google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state
that this check relies on the presence of an internal type exposing
InvalidUTF8(), that the internal package is not guaranteed stable and could be
removed or changed in future protobuf releases, and that in such cases
hasInvalidUTF8Error will gracefully return false; reference the symbols
invalidUTF8Error and hasInvalidUTF8Error and mention that errors.As is
intentionally used with an interface target to detect that condition.
router-tests/telemetry/attribute_processor_test.go (1)

206-212: Redundant double log filtering.

FilterMessageSnippet("sanitize_utf8").All() is called once inside require.Eventually (where the result is discarded) and then again on Lines 211–212. After require.Eventually exits successfully the log entries are guaranteed to exist; the second call is unnecessary.

♻️ Proposed refactor
-		require.Eventually(t, func() bool {
-			logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-			return len(logs) > 0
-		}, 5*time.Second, 100*time.Millisecond)
-
-		logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-		require.NotEmpty(t, logs)
+		var logs []observer.LoggedEntry
+		require.Eventually(t, func() bool {
+			logs = xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
+			return len(logs) > 0
+		}, 5*time.Second, 100*time.Millisecond)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router-tests/telemetry/attribute_processor_test.go` around lines 206 - 212,
The test is calling xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
twice (once inside require.Eventually and again after), which is redundant;
declare a local variable logs before the require.Eventually call, assign to logs
inside the closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Line 213: The test calls require.Equal with the actual value first and the
expected string second, which reverses Testify's expected ordering and produces
confusing failure messages; change the call to require.Equal(t, "otel error:
traces export: string field contains invalid UTF-8: Enable
'telemetry.tracing.sanitize_utf8.enabled' in your config to sanitize invalid
UTF-8 attributes.", logs[0].Message) so the literal expectation is the
second-to-last argument and logs[0].Message is the actual (last) argument;
update the invocation that references require.Equal and logs[0].Message
accordingly.

---

Outside diff comments:
In `@router/pkg/trace/meter.go`:
- Around line 218-222: The global error handler is being set unconditionally
causing test-time races; guard the otel.SetErrorHandler call the same way
otel.SetTracerProvider is guarded: only call
otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.

---

Nitpick comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Around line 206-212: The test is calling
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All() twice (once inside
require.Eventually and again after), which is redundant; declare a local
variable logs before the require.Eventually call, assign to logs inside the
closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.

In `@router/pkg/trace/errors.go`:
- Around line 8-22: Add a short comment above the invalidUTF8Error type and
hasInvalidUTF8Error function that documents the coupling to
google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state that this
check relies on the presence of an internal type exposing InvalidUTF8(), that
the internal package is not guaranteed stable and could be removed or changed in
future protobuf releases, and that in such cases hasInvalidUTF8Error will
gracefully return false; reference the symbols invalidUTF8Error and
hasInvalidUTF8Error and mention that errors.As is intentionally used with an
interface target to detect that condition.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 153432d and eb1a06e.

📒 Files selected for processing (3)
  • router-tests/telemetry/attribute_processor_test.go
  • router/pkg/trace/errors.go
  • router/pkg/trace/meter.go

@endigma
Copy link
Member

endigma commented Feb 24, 2026

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

@StarpTech
Copy link
Contributor

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

@SkArchon is validating the overhead. If negligible, I'd also suggest enabling it by default.

@SkArchon
Copy link
Contributor Author

SkArchon commented Feb 25, 2026

It takes about 14.70089313 microseconds per request (with no custom metric configuration, more string attributes would mean this increases), lets assume the average is about 140ns per attribute (this is only applied to string attributes). While fast, since this is on the hot path and as only one customer has reported it, I would prefer having the log message, that way when folks do see the log message they are almost likely to go and enable this.

Copy link
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants