feat: Integrate OpenTelemetry for distributed tracing and metrics#186
feat: Integrate OpenTelemetry for distributed tracing and metrics#186KitAndrew wants to merge 3 commits intocrossoverJie:masterfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds OpenTelemetry tracing and Micrometer OTLP metrics across services: dependency updates, new OtelConfig classes, Tracer and Counter beans, instrumentation of route and server flows with spans and counters, application YAML OTLP settings, and Jaeger runtime additions in Docker configs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant RouteController
participant CIMServer
participant OpenTelemetry as Tracer
participant OTLP
Client->>RouteController: send message / login / register
activate RouteController
RouteController->>OpenTelemetry: start span (route)
RouteController->>CIMServer: forward message / request
activate CIMServer
CIMServer->>OpenTelemetry: start span (server)
CIMServer->>CIMServer: write socket / update session / metrics++
CIMServer->>OpenTelemetry: set attributes, set status, end span
deactivate CIMServer
RouteController->>OpenTelemetry: set attributes, set status, end span
deactivate RouteController
OpenTelemetry->>OTLP: BatchSpanProcessor exports spans
OTLP->>OTLP: Jaeger stores & displays traces
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java`:
- Around line 50-52: The endpoint derivation using otlpEndpoint.replace(...) is
fragile; introduce a dedicated gRPC endpoint configuration (e.g., a new property
name like otlpGrpcEndpoint) and use that when building the OtlpGrpcSpanExporter
instead of mutating otlpEndpoint. Update OtelConfig to read the new
otlpGrpcEndpoint (with a sensible default or explicit validation) and construct
OtlpGrpcSpanExporter.builder().setEndpoint(otlpGrpcEndpoint).build(); leave
otlpEndpoint for HTTP exporters and ensure any existing callers/configs are
updated to provide the new property.
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 186-189: In RouteController's p2pRoute method catch block for
CIMException you only call span.setStatus(...) and set the response, so
exception details aren't recorded in tracing; add span.recordException(e)
alongside span.setStatus(StatusCode.ERROR, e.getErrorMessage()) (matching the
pattern used in groupRoute and login) so the exception and stack trace are
emitted to Jaeger when p2pRoute fails.
- Around line 235-238: The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
- Around line 101-102: The created Span in RouteController (Span span =
tracer.spanBuilder("cim.route.groupMsg").startSpan() inside groupRoute — and
similarly in p2pRoute and login) is never made current, breaking context
propagation; fix by making the span current with span.makeCurrent() (assign to a
Scope, e.g., try-with-resources Scope scope = span.makeCurrent()) before
executing downstream work and ensure the Scope is closed and span.end() is
always called in a finally block so child spans and instrumented libraries
inherit the context.
In `@cim-forward-route/src/main/resources/application.yaml`:
- Around line 73-82: The OTLP metrics property path is incorrect and tracing
sampling is hard-coded; change the metrics config key from
management.otlp.metrics.endpoint to management.otlp.metrics.export.url (per
Spring Boot/Micrometer OTLP) and externalize the tracing sampling probability
(management.tracing.sampling.probability) by reading it from an environment
variable or profile-specific property (e.g.,
${TRACING_SAMPLING_PROBABILITY:0.1}) or a non-dev profile override so production
does not trace 100% of requests.
In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Around line 47-52: The static Counter fields loginCounter, heartbeatCounter,
and messagePushedCounter in OtelConfig are written during Spring initialization
and later read from Netty event-loop threads, so add the volatile modifier to
each static field declaration to ensure visibility across threads (Lombok's
`@Getter` can stay as-is); update the declarations of loginCounter,
heartbeatCounter, and messagePushedCounter to be private static volatile Counter
... so Netty threads reliably see the initialized non-null counters.
In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 93-101: The empty if block after creating Span (Span span in
CIMServerHandle) should actually make the span current: call span.makeCurrent()
and assign to a Scope (e.g., Scope scope) when span != null, and ensure the
Scope is closed and span.end() in the finally block; alternatively remove the
misleading if block—prefer implementing Scope scope = span.makeCurrent() inside
the try and closing scope.close() (and ending span) in the finally to ensure
proper context propagation.
In `@cim-server/src/main/resources/application.yaml`:
- Around line 40-52: The YAML uses an unrecognized property for OTLP metrics:
replace management.otlp.metrics.endpoint with the correct nested property
management.otlp.metrics.export.url so that metrics export is configured as
management -> otlp -> metrics -> export -> url (keep
management.otlp.tracing.endpoint as-is); update the application.yaml to reflect
this property name change so Spring Boot 3.x will actually export metrics to the
OTLP collector.
In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 45-46: The EXPOSE directive currently lists only Jaeger/OTel ports
(16686, 4317, 4318) but omits Zookeeper and Redis; update the EXPOSE instruction
in the Dockerfile (modify the EXPOSE line) to include ports 2181 and 6379 so it
reads to expose 16686 4317 4318 2181 6379, and verify whether port 8083 should
be added based on the application before including it.
- Around line 13-18: The Dockerfile currently hardcodes the Jaeger artifact
"jaeger-1.57.0-linux-amd64.tar.gz" and will fail for non-amd64 builds; change
the step to accept a build ARG TARGETARCH, map TARGETARCH values to the correct
Jaeger archive arch string (e.g., use "amd64" for TARGETARCH=amd64 and "arm64"
for TARGETARCH=arm64, and fail/skip with a clear message for unsupported
TARGETARCH like linux/arm/v7), then use that mapped arch when composing the
download filename and extracting the "jaeger-all-in-one" binary so the RUN
wget/tar/mv/rm sequence fetches the correct platform-specific tarball.
In `@docker/supervisord.conf`:
- Around line 34-38: The supervisord [program:server] startup sequence makes
Jaeger/OTLP a hard dependency by calling /wait-for-it.sh on port 4317 before
launching java -jar /cim-server.jar; update the command in the supervisord
[program:server] (and any analogous [program:cim-route]) to remove the
/wait-for-it.sh -t 30 localhost:4317 check (or if you must keep it, increase the
timeout significantly and add a clear comment documenting that Jaeger is
required), leaving only the Redis (6379) and Zookeeper (2181) waits so the OTLP
exporter can retry/buffer without blocking application start.
In `@pom.xml`:
- Around line 26-28: The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).
In `@README.md`:
- Line 25: The checklist item "Integrate OpenTelemetry ." has an extra space
before the period; update that checklist line (the "- [x] Integrate
OpenTelemetry ." entry) to remove the extra space so it reads "Integrate
OpenTelemetry." to correct the punctuation.
🧹 Nitpick comments (6)
docker/supervisord.conf (1)
24-32: Nit: Redundant OTLP enablement — both CLI flag and environment variable are set.Line 25 passes
--collector.otlp.enabled=trueand Line 26 sets the equivalentCOLLECTOR_OTLP_ENABLED="true"env var. One of them is sufficient.cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (2)
115-121:tracerProvider.shutdown()is fire-and-forget — pending spans may be lost.
SdkTracerProvider.shutdown()returns aCompletableResultCode. Without awaiting its completion, in-flight spans in theBatchSpanProcessormay not be flushed before the JVM exits.Proposed fix: await shutdown completion
`@PreDestroy` public void shutdown() { if (tracerProvider != null) { - tracerProvider.shutdown(); + tracerProvider.shutdown().join(10, java.util.concurrent.TimeUnit.SECONDS); log.info("OpenTelemetry TracerProvider shut down"); } }
1-121: Significant code duplication withcim-server'sOtelConfig.The OTLP SDK initialization, tracer bean creation, and shutdown logic are nearly identical between this class and
cim-server/…/OtelConfig.java. Consider extracting a shared base configuration (e.g., incim-common) parameterized byservice.name, with module-specific subclasses or configs only defining their custom metrics.cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (1)
116-141:@Beanreturning a dummyStringto trigger metric registration is a code smell.Returning
"metricsRegistered"registers an unrelatedStringbean in the context and obscures the intent. A@PostConstructmethod on this@Configurationclass (withMeterRegistryinjected as a field) or aSmartInitializingSingletonwould be more idiomatic.Suggested alternative
+ `@Resource` + private MeterRegistry meterRegistry; + - `@Bean` - public String registerMetrics(MeterRegistry meterRegistry) { + `@PostConstruct` + public void registerMetrics() { Gauge.builder("cim.server.online.users", () -> SessionSocketHolder.getRelationShip().size()) .description("Current number of online users connected to this server") .register(meterRegistry); loginCounter = Counter.builder("cim.server.login.count") .description("Total number of login requests processed") .register(meterRegistry); heartbeatCounter = Counter.builder("cim.server.heartbeat.count") .description("Total number of heartbeat (PING) messages received") .register(meterRegistry); messagePushedCounter = Counter.builder("cim.server.messages.pushed") .description("Total number of messages pushed to clients") .register(meterRegistry); log.info("CIM server custom metrics registered"); - return "metricsRegistered"; }cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)
41-47: Per-requestSpringBeanFactory.getBean()lookup for theTracer.
getTracer()performs a Spring context lookup on everyLOGIN_REQUEST. While singletons are cached, the lookup still traverses the bean factory. Cache the result in avolatilefield after the first successful retrieval.Proposed fix
+ private volatile Tracer cachedTracer; + private Tracer getTracer() { + if (cachedTracer != null) { + return cachedTracer; + } try { - return SpringBeanFactory.getBean(Tracer.class); + cachedTracer = SpringBeanFactory.getBean(Tracer.class); + return cachedTracer; } catch (Exception e) { return null; } }cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)
85-139: Span ends before the asyncwriteAndFlushcompletes.
span.end()infinally(line 137) executes immediately aftersocketChannel.writeAndFlush(protocol)is called, but the actual network write hasn't completed yet — theChannelFutureListenerfires asynchronously later. This means the span duration does not reflect actual write latency or failures. If accurate push timing matters, consider ending the span inside the listener callback instead.
cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java
Show resolved
Hide resolved
cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java
Show resolved
Hide resolved
cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java
Show resolved
Hide resolved
| if (status != StatusEnum.SUCCESS) { | ||
| span.setAttribute("loginStatus", "failed"); | ||
| return res; | ||
| } |
There was a problem hiding this comment.
Login early-return leaves span status as UNSET.
When status != SUCCESS, the method returns early at line 237. The span is ended in finally, but its status is never explicitly set (only the attribute loginStatus=failed is recorded). In Jaeger, this span will appear without a clear OK/ERROR status, making it harder to filter and alert on failed logins.
Proposed fix
if (status != StatusEnum.SUCCESS) {
span.setAttribute("loginStatus", "failed");
+ span.setStatus(StatusCode.OK); // request itself succeeded; login was rejected by business logic
return res;
}🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 235 - 238, The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
docker/allin1-ubuntu.Dockerfile
Outdated
| # install Jaeger all-in-one (OpenTelemetry trace backend) | ||
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | ||
| RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \ | ||
| && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \ | ||
| && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \ | ||
| && rm -rf jaeger-1.57.0-linux-amd64* |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check Jaeger v1.57.0 available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
jq -r '.assets[].name' | grep -i 'linux'Repository: crossoverJie/cim
Length of output: 505
🏁 Script executed:
#!/bin/bash
# Get more details on Jaeger release to understand available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
jq -r '.assets[] | "\(.name)"' | sortRepository: crossoverJie/cim
Length of output: 965
🏁 Script executed:
#!/bin/bash
# Find and examine the Dockerfile to understand context
fd -t f -name "*allin1-ubuntu.Dockerfile" -o -name "*.Dockerfile" | head -20Repository: crossoverJie/cim
Length of output: 290
🏁 Script executed:
#!/bin/bash
# Check for README to verify platform claims
fd -t f -name "README*" | head -5Repository: crossoverJie/cim
Length of output: 290
🏁 Script executed:
#!/bin/bash
cat docker/allin1-ubuntu.Dockerfile | head -50Repository: crossoverJie/cim
Length of output: 1749
🏁 Script executed:
#!/bin/bash
find . -name README.md -o -name README.txt | xargs cat 2>/dev/null | head -100Repository: crossoverJie/cim
Length of output: 3877
🏁 Script executed:
#!/bin/bash
# Check if there's a Dockerfile.build context or multi-arch setup mentioned
find . -name Dockerfile -o -name "*.Dockerfile" | xargs ls -laRepository: crossoverJie/cim
Length of output: 225
🏁 Script executed:
#!/bin/bash
# Look for docker build scripts, CI/CD configs, or hints about multi-arch builds
find . -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.sh" | xargs grep -l "docker\|buildx\|arm64\|amd64" 2>/dev/null | head -10Repository: crossoverJie/cim
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Check if there's any reference to multi-arch or arm in docker-related configs
git ls-files | grep -E "(docker|build|workflow|\.github)" | head -20Repository: crossoverJie/cim
Length of output: 337
🏁 Script executed:
#!/bin/bash
cat .github/workflows/docker.ymlRepository: crossoverJie/cim
Length of output: 1387
Jaeger binary is hardcoded to linux-amd64, breaking multi-arch CI/CD builds.
The CI/CD workflow explicitly builds for linux/amd64, linux/arm64, and linux/arm/v7, but this Dockerfile step downloads only the amd64 binary, causing arm64 builds to fail. Use TARGETARCH build arg to select the correct architecture:
🐛 Proposed fix
+ARG TARGETARCH
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
-RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
- && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
- && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
- && rm -rf jaeger-1.57.0-linux-amd64*
+RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
+ && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+ && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+ && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
+ && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*Note: Jaeger v1.57.0 provides only amd64 and arm64 binaries. The linux/arm/v7 platform in the CI/CD workflow cannot be satisfied with this version.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # install Jaeger all-in-one (OpenTelemetry trace backend) | |
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | |
| RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \ | |
| && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \ | |
| && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \ | |
| && rm -rf jaeger-1.57.0-linux-amd64* | |
| ARG TARGETARCH | |
| # install Jaeger all-in-one (OpenTelemetry trace backend) | |
| # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318 | |
| RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \ | |
| && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ | |
| && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \ | |
| && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \ | |
| && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}* |
🤖 Prompt for AI Agents
In `@docker/allin1-ubuntu.Dockerfile` around lines 13 - 18, The Dockerfile
currently hardcodes the Jaeger artifact "jaeger-1.57.0-linux-amd64.tar.gz" and
will fail for non-amd64 builds; change the step to accept a build ARG
TARGETARCH, map TARGETARCH values to the correct Jaeger archive arch string
(e.g., use "amd64" for TARGETARCH=amd64 and "arm64" for TARGETARCH=arm64, and
fail/skip with a clear message for unsupported TARGETARCH like linux/arm/v7),
then use that mapped arch when composing the download filename and extracting
the "jaeger-all-in-one" binary so the RUN wget/tar/mv/rm sequence fetches the
correct platform-specific tarball.
pom.xml
Outdated
| <micrometer-tracing.version>1.3.0</micrometer-tracing.version> | ||
| <micrometer.version>1.13.0</micrometer.version> | ||
| <opentelemetry.version>1.38.0</opentelemetry.version> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and examine the pom.xml file
find . -maxdepth 2 -name "pom.xml" -type f | head -5Repository: crossoverJie/cim
Length of output: 185
🌐 Web query:
Spring Boot 3.3.0 micrometer version managed dependency BOM
💡 Result:
Spring Boot 3.3.0’s managed dependency BOM (org.springframework.boot:spring-boot-dependencies:3.3.0) aligns with:
- Micrometer (metrics): 1.13.0 (i.e.,
io.micrometer:micrometer-*) [1][2] - Micrometer Tracing: 1.3.0 (i.e.,
io.micrometer:micrometer-tracing-*) [1]
Sources: [1] [2]
🏁 Script executed:
# Examine the root pom.xml file to see parent version and context
cat -n pom.xml | head -60Repository: crossoverJie/cim
Length of output: 2860
🌐 Web query:
Spring Boot 3.3.0 opentelemetry version managed dependency
💡 Result:
Spring Boot 3.3.0 (via org.springframework.boot:spring-boot-dependencies:3.3.0) manages OpenTelemetry 1.37.x — i.e. the managed OpenTelemetry BOM line is io.opentelemetry:opentelemetry-bom:1.37.0 / opentelemetry.version 1.37.0. [1], [2]
Sources: [1], [2]
Remove redundant Micrometer/Micrometer Tracing version properties; align OpenTelemetry version with Spring Boot 3.3.0.
Spring Boot 3.3.0 already manages micrometer-core 1.13.0 and micrometer-tracing 1.3.0—the versions declared here match exactly and are redundant. Removing these properties will prevent version drift.
However, Spring Boot 3.3.0 manages OpenTelemetry 1.37.0, while this pom.xml declares 1.38.0. Verify whether the newer version is intentional; if not, align with Spring Boot's managed version to maintain consistency and avoid potential incompatibilities in the classpath.
🤖 Prompt for AI Agents
In `@pom.xml` around lines 26 - 28, The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).
- Use dedicated gRPC endpoint property (otel.exporter.otlp.grpc.endpoint) - Add span.makeCurrent() with Scope for proper context propagation - Add span.recordException() in p2pRoute catch block - Set span status on login failure early return - Fix OTLP metrics property (management.otlp.metrics.export.url) - Externalize tracing sampling probability via env var - Use volatile for static counter fields in cim-server OtelConfig - Use @PostConstruct instead of @bean returning String for metrics - Await TracerProvider shutdown completion with join() - Cache Tracer with volatile field in CIMServerHandle - Support multi-arch Docker builds with TARGETARCH - Expose Zookeeper/Redis ports in Dockerfile - Remove Jaeger as hard startup dependency in supervisord - Remove redundant micrometer version properties from POM - Update OpenTelemetry version to 1.40.0 - Fix README punctuation
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 155-196: p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.
In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 95: Remove the trailing blank line at the end of application.yaml so the
file ends immediately after the last YAML content; open application.yaml, delete
the extra empty newline/whitespace after the final entry, save, and re-run
yamllint to confirm no EOF blank-line warning remains.
- Around line 73-94: The YAML and manual SDK both configure tracing causing
spans to be exported to different endpoints; consolidate to one approach by
either removing the custom OtelConfig and its
OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so all spans use Spring
Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
🧹 Nitpick comments (3)
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)
38-54: LazyTracerinitialization is fine for this non-Spring-managed handler, but consider a small robustness improvement.The double-read of
cachedTracerwithout synchronization is benign (worst case: redundantgetBeancalls). However, thecatch (Exception e)silently swallows all exceptions and returnsnull, which means configuration errors (e.g., missing bean definition) will be hidden. Consider logging atdebug/warnlevel on first failure so misconfigurations are detectable.Proposed improvement
try { cachedTracer = SpringBeanFactory.getBean(Tracer.class); return cachedTracer; } catch (Exception e) { + log.debug("Tracer bean not available, tracing disabled: {}", e.getMessage()); return null; }cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (1)
36-37: UnusedmeterRegistryfield injection.The
meterRegistryfield is injected via@Resourcebut never referenced. The counter@Beanmethods receiveMeterRegistryas a method parameter (Spring auto-injects it), so this field is dead code.Proposed fix
- `@jakarta.annotation.Resource` - private MeterRegistry meterRegistry; -cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)
285-301: Consider adding span tracing toregisterAccountfor consistency.All other mutating endpoints (
groupRoute,p2pRoute,login) are instrumented with both spans and counters, butregisterAccountonly has a counter increment. If a registration fails, there's no trace to diagnose it. This is a minor consistency gap.
| public BaseResponse<NULLBody> p2pRoute(@RequestBody P2PReqVO p2pRequest) { | ||
| BaseResponse<NULLBody> res = new BaseResponse(); | ||
|
|
||
| try { | ||
| //获取接收消息用户的路由信息 | ||
| Optional<CIMServerResVO> cimServerResVO = accountService.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId()); | ||
| // === OpenTelemetry: 创建私聊路由 Span === | ||
| Span span = tracer.spanBuilder("cim.route.p2pMsg").startSpan(); | ||
|
|
||
| try (Scope scope = span.makeCurrent()) { | ||
| // 获取接收消息用户的路由信息 | ||
| Optional<CIMServerResVO> cimServerResVO = accountService | ||
| .loadRouteRelatedByUserId(p2pRequest.getReceiveUserId()); | ||
| if (cimServerResVO.isEmpty()) { | ||
| log.warn("userId={} not online, save offline msg", p2pRequest.getReceiveUserId()); | ||
| offlineMsgService.saveOfflineMsg(p2pRequest); | ||
| throw new CIMException(OFF_LINE); | ||
| } | ||
|
|
||
| //p2pRequest.getReceiveUserId()==>消息接收者的 userID | ||
| ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), p2pRequest.getBatchMsg()); | ||
| // p2pRequest.getReceiveUserId()==>消息接收者的 userID | ||
| ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), | ||
| p2pRequest.getBatchMsg()); | ||
| accountService.pushMsg(cimServerResVO.get(), p2pRequest.getUserId(), chatVO); | ||
|
|
||
| // 记录 Span 属性 | ||
| span.setAttribute("senderId", p2pRequest.getUserId()); | ||
| span.setAttribute("receiverId", p2pRequest.getReceiveUserId()); | ||
| span.setStatus(StatusCode.OK); | ||
|
|
||
| // === Metrics: 递增私聊计数器 === | ||
| p2pMessageCounter.increment(); | ||
|
|
||
| res.setCode(StatusEnum.SUCCESS.getCode()); | ||
| res.setMessage(StatusEnum.SUCCESS.getMessage()); | ||
|
|
||
| } catch (CIMException e) { | ||
| span.setStatus(StatusCode.ERROR, e.getErrorMessage()); | ||
| span.recordException(e); | ||
| res.setCode(e.getErrorCode()); | ||
| res.setMessage(e.getErrorMessage()); | ||
| } finally { | ||
| span.end(); | ||
| } | ||
| return res; | ||
| } |
There was a problem hiding this comment.
p2pRoute only catches CIMException — non-CIM exceptions leave span status UNSET.
If a non-CIMException (e.g., NullPointerException, network error) is thrown, it bypasses the catch (CIMException e) block and goes directly to finally, where span.end() is called but the span status remains UNSET and the exception is not recorded. This contrasts with groupRoute and login, which catch all Exception types.
Proposed fix — add a general catch block
} catch (CIMException e) {
span.setStatus(StatusCode.ERROR, e.getErrorMessage());
span.recordException(e);
res.setCode(e.getErrorCode());
res.setMessage(e.getErrorMessage());
+ } catch (Exception e) {
+ span.setStatus(StatusCode.ERROR, e.getMessage());
+ span.recordException(e);
+ throw e;
} finally {
span.end();
}🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 155 - 196, p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.
| # OpenTelemetry 配置 | ||
| management: | ||
| otlp: | ||
| tracing: | ||
| endpoint: http://localhost:4318/v1/traces | ||
| metrics: | ||
| export: | ||
| url: http://localhost:4318/v1/metrics | ||
| tracing: | ||
| sampling: | ||
| probability: ${TRACING_SAMPLING_PROBABILITY:1.0} | ||
| endpoints: | ||
| web: | ||
| exposure: | ||
| include: health,info,metrics | ||
|
|
||
| # OpenTelemetry gRPC endpoint for SDK exporter | ||
| otel: | ||
| exporter: | ||
| otlp: | ||
| grpc: | ||
| endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find and examine OtelConfig.java
fd -type f -name "OtelConfig.java" -exec cat -n {} \;
echo "---"
# Check the application.yaml file to confirm configuration
fd -type f -name "application.yaml" -path "*/cim-forward-route/*" -exec cat -n {} \;Repository: crossoverJie/cim
Length of output: 422
🏁 Script executed:
#!/bin/bash
# Find OtelConfig.java anywhere in the repo
find . -name "OtelConfig.java" -type f
echo "---"
# Also search for files containing OpenTelemetrySdk
rg -l "OpenTelemetrySdk" --type java
echo "---"
# Check for application.yaml in the cim-forward-route module
find . -path "*/cim-forward-route/*/application.yaml" -type f
echo "---"
# Look for pom.xml in cim-forward-route to check dependencies
find . -path "*/cim-forward-route/pom.xml" -type fRepository: crossoverJie/cim
Length of output: 531
🏁 Script executed:
#!/bin/bash
# Examine OtelConfig.java in cim-forward-route
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.javaRepository: crossoverJie/cim
Length of output: 4419
🏁 Script executed:
#!/bin/bash
# Check pom.xml for Spring Boot version and OpenTelemetry dependencies
cat -n cim-forward-route/pom.xmlRepository: crossoverJie/cim
Length of output: 6444
🏁 Script executed:
#!/bin/bash
# Read the application.yaml file for cim-forward-route
cat -n cim-forward-route/src/main/resources/application.yaml
echo "---"
# Check if there are any autoconfiguration exclusions
rg -n "spring.autoconfigure.exclude" cim-forward-route/Repository: crossoverJie/cim
Length of output: 2881
🏁 Script executed:
#!/bin/bash
# Check if GlobalOpenTelemetry.set() is called
rg -n "GlobalOpenTelemetry.set" --type java
echo "---"
# Check parent pom.xml for Spring Boot version
cat cim/pom.xml | head -50Repository: crossoverJie/cim
Length of output: 124
🏁 Script executed:
#!/bin/bash
# Find the parent pom.xml
find . -name "pom.xml" -path "*/cim/*" -type f | head -5
echo "---"
# Check the root pom.xml
ls -la pom.xml 2>/dev/null && cat -n pom.xml | head -100Repository: crossoverJie/cim
Length of output: 4736
🏁 Script executed:
#!/bin/bash
# Check for any spring.autoconfigure.exclude in the parent or module poms
rg -n "spring.autoconfigure.exclude" --type xml
echo "---"
# Check for how the Tracer bean is being used in the codebase
rg -n "private.*Tracer\|@Autowired.*Tracer\|@Resource.*Tracer" --type java cim-forward-route/src/main/java/
echo "---"
# Check if spring-boot-starter-actuator triggers auto-config
rg -n "spring.boot.actuate" --type java cim-forward-route/src/main/java/ | head -20Repository: crossoverJie/cim
Length of output: 67
🏁 Script executed:
#!/bin/bash
# Search for any usage of the Tracer bean in cim-forward-route
rg -n "tracer\|Tracer" --type java cim-forward-route/src/main/java/ | grep -v "TracerProvider\|OpenTelemetry" | head -30
echo "---"
# Check if there are any span creations
rg -n "startSpan\|startActiveSpan" --type java cim-forward-route/src/main/java/Repository: crossoverJie/cim
Length of output: 63
🏁 Script executed:
#!/bin/bash
# Check cim-server's OtelConfig to see if it follows the same pattern
cat -n cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java | head -70
echo "---"
# Check if cim-server's application.yaml has similar configuration
cat -n cim-server/src/main/resources/application.yaml | grep -A 25 "OpenTelemetry"Repository: crossoverJie/cim
Length of output: 3954
🏁 Script executed:
#!/bin/bash
# Check if cim-server uses the Tracer bean
rg -n "Tracer" cim-server/src/main/java/ --type java | grep -v "TracerProvider\|@Bean" | head -20
echo "---"
# Check how GlobalOpenTelemetry is used or if it's set anywhere
rg -n "GlobalOpenTelemetry\|setGlobal" --type javaRepository: crossoverJie/cim
Length of output: 1948
🏁 Script executed:
#!/bin/bash
# Check if cim-forward-route's RouteHandler or MessageHandler uses Tracer
rg -n "import.*Tracer|new.*Span|startSpan|startActiveSpan" cim-forward-route/src/main/java/ --type java
echo "---"
# Check for Tracer usage in cim-forward-route via SpringBeanFactory or direct injection
rg -n "@Autowired.*Tracer|@Resource.*Tracer|getBean.*Tracer" cim-forward-route/src/main/java/ --type java
echo "---"
# List main classes in cim-forward-route to check if Tracer is injected
ls -la cim-forward-route/src/main/java/com/crossoverjie/cim/route/*/Repository: crossoverJie/cim
Length of output: 3642
🏁 Script executed:
#!/bin/bash
# Check how tracer is injected in RouteController
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java | head -50Repository: crossoverJie/cim
Length of output: 2603
🏁 Script executed:
#!/bin/bash
# Check the field declarations in RouteController
sed -n '40,120p' cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.javaRepository: crossoverJie/cim
Length of output: 2486
Dual tracing setup exports spans to different endpoints — consolidate to single configuration.
The application configures two separate OpenTelemetry tracing paths:
-
Spring Boot auto-configuration (via
micrometer-tracing-bridge-otelon classpath +management.otlp.tracing.endpointin YAML) auto-instruments HTTP requests and exports spans via HTTP tohttp://localhost:4318/v1/traces. -
Manual SDK in OtelConfig creates a separate
OpenTelemetrySdkwithSdkTracerProviderandOtlpGrpcSpanExporter, exporting spans via gRPC tohttp://localhost:4317(configured byotel.exporter.otlp.grpc.endpoint).
RouteController explicitly injects and uses the manual Tracer bean to create spans for group messaging, P2P messaging, and login operations. This means spans from these operations use the manual gRPC path (port 4317), while Spring Boot's auto-instrumented request/response tracing uses the HTTP path (port 4318). If only one collector is running, you'll lose traces; if both are running, traces are split across endpoints and difficult to correlate.
Choose one approach:
- Option A: Rely solely on Spring Boot's auto-configuration by removing OtelConfig and letting Spring Boot handle all tracing.
- Option B: Disable Spring Boot's auto-configuration (
spring.autoconfigure.exclude) and keep only OtelConfig for manual control.
🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` around lines 73 - 94,
The YAML and manual SDK both configure tracing causing spans to be exported to
different endpoints; consolidate to one approach by either removing the custom
OtelConfig and its OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so
all spans use Spring Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
| otlp: | ||
| grpc: | ||
| endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317} | ||
|
|
There was a problem hiding this comment.
Trailing blank line at end of file.
YAMLlint flags an extra blank line at the end. Remove it to satisfy the linter.
🧰 Tools
🪛 YAMLlint (1.38.0)
[warning] 95-95: too many blank lines (1 > 0)
(empty-lines)
🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` at line 95, Remove the
trailing blank line at the end of application.yaml so the file ends immediately
after the last YAML content; open application.yaml, delete the extra empty
newline/whitespace after the final entry, save, and re-run yamllint to confirm
no EOF blank-line warning remains.
|
Sorry for the late reply, please check the CI. |
Spring Boot BOM does not manage micrometer-tracing-bridge-otel and micrometer-registry-otlp versions. Restored version properties with updated versions (micrometer-tracing 1.4.2, micrometer 1.14.3).
What
Integrate OpenTelemetry into
cim-serverandcim-forward-routemodules to enable distributed tracing and business metrics collection, as listed in the TODO.Why
As CIM is a distributed IM system with multiple services (server, route), it's essential to have end-to-end tracing and metrics to monitor message flow, identify bottlenecks, and debug issues in production.
Changes
Dependencies (pom.xml)
micrometer-tracing-bridge-otel,opentelemetry-exporter-otlp,opentelemetry-api,opentelemetry-sdk,micrometer-registry-otlpto root and module POMs.cim-server
LOGIN_REQUESTandPINGcommands with metrics counters.cim-forward-route
Docker
Custom Metrics
cim.server.online.userscim.server.messages.pushedcim.server.heartbeat.countcim.server.login.countcim.route.messages.groupcim.route.messages.p2pcim.route.login.countcim.route.register.countHow to verify
http://localhost:16686.cim-serverorcim-forward-routeto view traces.http://localhost:8081/actuator/metrics/cim.server.online.users.Summary by CodeRabbit
New Features
Documentation
Chores