Skip to content

feat: Integrate OpenTelemetry for distributed tracing and metrics#186

Open
KitAndrew wants to merge 3 commits intocrossoverJie:masterfrom
KitAndrew:feat/opentelemetry-integration
Open

feat: Integrate OpenTelemetry for distributed tracing and metrics#186
KitAndrew wants to merge 3 commits intocrossoverJie:masterfrom
KitAndrew:feat/opentelemetry-integration

Conversation

@KitAndrew
Copy link

@KitAndrew KitAndrew commented Feb 11, 2026

What

Integrate OpenTelemetry into cim-server and cim-forward-route modules to enable distributed tracing and business metrics collection, as listed in the TODO.

Why

As CIM is a distributed IM system with multiple services (server, route), it's essential to have end-to-end tracing and metrics to monitor message flow, identify bottlenecks, and debug issues in production.

Changes

Dependencies (pom.xml)

  • Add micrometer-tracing-bridge-otel, opentelemetry-exporter-otlp, opentelemetry-api, opentelemetry-sdk, micrometer-registry-otlp to root and module POMs.

cim-server

  • New OtelConfig.java: Initialize OpenTelemetry SDK, configure OTLP exporter to Jaeger, register custom metrics.
  • Modified CIMServerHandle.java: Add tracing spans for LOGIN_REQUEST and PING commands with metrics counters.
  • Modified CIMServer.java: Add tracing span for sendMsg with message pushed counter.
  • Modified application.yaml: Add OTLP endpoints and Actuator metrics exposure.

cim-forward-route

  • New OtelConfig.java: Initialize OpenTelemetry SDK with route-specific metrics (group/p2p message counters, login/register counters).
  • Modified RouteController.java: Add tracing spans for groupRoute, p2pRoute, login with corresponding metrics.
  • Modified application.yaml: Add OTLP endpoints and Actuator metrics exposure.

Docker

  • Modified allin1-ubuntu.Dockerfile: Add Jaeger all-in-one binary.
  • Modified supervisord.conf: Add Jaeger service, server/route wait for Jaeger OTLP port (4317) before starting.

Custom Metrics

Module Metric Type
cim-server cim.server.online.users Gauge
cim-server cim.server.messages.pushed Counter
cim-server cim.server.heartbeat.count Counter
cim-server cim.server.login.count Counter
cim-forward-route cim.route.messages.group Counter
cim-forward-route cim.route.messages.p2p Counter
cim-forward-route cim.route.login.count Counter
cim-forward-route cim.route.register.count Counter

How to verify

  1. Start the all-in-one Docker container (includes Jaeger).
  2. Access Jaeger UI at http://localhost:16686.
  3. Select service cim-server or cim-forward-route to view traces.
  4. Access metrics via Actuator: http://localhost:8081/actuator/metrics/cim.server.online.users.

Summary by CodeRabbit

  • New Features

    • Added distributed tracing and metrics with OTLP export and local Jaeger all-in-one for visualization (port 16686); instrumented key flows (login, heartbeat, message push, routing) with spans and counters.
  • Documentation

    • Marked OpenTelemetry integration as completed in README(s).
  • Chores

    • Updated runtime orchestration to launch Jaeger alongside services and expose OTLP ports.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3779e4 and 974aca4.

📒 Files selected for processing (1)
  • pom.xml
🚧 Files skipped from review as they are similar to previous changes (1)
  • pom.xml

📝 Walkthrough

Walkthrough

Adds OpenTelemetry tracing and Micrometer OTLP metrics across services: dependency updates, new OtelConfig classes, Tracer and Counter beans, instrumentation of route and server flows with spans and counters, application YAML OTLP settings, and Jaeger runtime additions in Docker configs.

Changes

Cohort / File(s) Summary
Documentation
README.md, README-zh.md
Marked OpenTelemetry integration item as completed in README files.
Root dependency management
pom.xml
Added OpenTelemetry and Micrometer version properties and dependencyManagement entries for tracing and OTLP registry.
Module dependencies
cim-forward-route/pom.xml, cim-server/pom.xml
Added dependencies: micrometer-tracing-bridge-otel, opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-otlp, micrometer-registry-otlp.
OpenTelemetry config classes
cim-forward-route/src/.../config/OtelConfig.java, cim-server/src/.../config/OtelConfig.java
New Spring config beans to build OpenTelemetry SDK, OTLP gRPC exporter, BatchSpanProcessor, Tracer beans, custom Micrometer Counters/Gauges, resource attrs, and graceful shutdown hooks.
Controller instrumentation
cim-forward-route/src/.../controller/RouteController.java
Injected Tracer and Counter fields; wrapped groupRoute, p2pRoute, login, registerAccount with spans (attributes, status, exception recording) and incremented counters.
Server instrumentation
cim-server/src/.../handle/CIMServerHandle.java, cim-server/src/.../server/CIMServer.java
Added Tracer usage and spans for LOGIN/PING and message push flows; added metrics counter increments and lazy tracer retrieval helper; minor lifecycle/close adjustments.
Application properties
cim-forward-route/src/.../application.yaml, cim-server/src/.../application.yaml
Added OTLP tracing/metrics endpoints, sampling probability, actuator exposure, and otel.exporter.otlp.grpc.endpoint.
Docker / runtime
docker/allin1-ubuntu.Dockerfile, docker/supervisord.conf
Installed Jaeger all-in-one, exposed Jaeger/OTLP ports (16686, 4317, 4318), added supervisord jaeger program and updated service start to wait for dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RouteController
    participant CIMServer
    participant OpenTelemetry as Tracer
    participant OTLP

    Client->>RouteController: send message / login / register
    activate RouteController
    RouteController->>OpenTelemetry: start span (route)
    RouteController->>CIMServer: forward message / request
    activate CIMServer
    CIMServer->>OpenTelemetry: start span (server)
    CIMServer->>CIMServer: write socket / update session / metrics++
    CIMServer->>OpenTelemetry: set attributes, set status, end span
    deactivate CIMServer
    RouteController->>OpenTelemetry: set attributes, set status, end span
    deactivate RouteController
    OpenTelemetry->>OTLP: BatchSpanProcessor exports spans
    OTLP->>OTLP: Jaeger stores & displays traces
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Feat/offline msg #170: Modifies RouteController (offline-message endpoints) and may overlap/conflict with tracing/metrics changes applied to the same controller.

Suggested labels

new features

Suggested reviewers

  • crossoverJie

Poem

🐰 Hopping through spans with a carrot of cheer,

Counters that count every hop and each call,
Jaeger hums softly, traces standing near,
I nibble on logs and watch metrics tall,
Observability blooms — hop, sniff, and all!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately summarizes the main changes - adding OpenTelemetry integration for distributed tracing and metrics across cim-server and cim-forward-route.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java`:
- Around line 50-52: The endpoint derivation using otlpEndpoint.replace(...) is
fragile; introduce a dedicated gRPC endpoint configuration (e.g., a new property
name like otlpGrpcEndpoint) and use that when building the OtlpGrpcSpanExporter
instead of mutating otlpEndpoint. Update OtelConfig to read the new
otlpGrpcEndpoint (with a sensible default or explicit validation) and construct
OtlpGrpcSpanExporter.builder().setEndpoint(otlpGrpcEndpoint).build(); leave
otlpEndpoint for HTTP exporters and ensure any existing callers/configs are
updated to provide the new property.

In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 186-189: In RouteController's p2pRoute method catch block for
CIMException you only call span.setStatus(...) and set the response, so
exception details aren't recorded in tracing; add span.recordException(e)
alongside span.setStatus(StatusCode.ERROR, e.getErrorMessage()) (matching the
pattern used in groupRoute and login) so the exception and stack trace are
emitted to Jaeger when p2pRoute fails.
- Around line 235-238: The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.
- Around line 101-102: The created Span in RouteController (Span span =
tracer.spanBuilder("cim.route.groupMsg").startSpan() inside groupRoute — and
similarly in p2pRoute and login) is never made current, breaking context
propagation; fix by making the span current with span.makeCurrent() (assign to a
Scope, e.g., try-with-resources Scope scope = span.makeCurrent()) before
executing downstream work and ensure the Scope is closed and span.end() is
always called in a finally block so child spans and instrumented libraries
inherit the context.

In `@cim-forward-route/src/main/resources/application.yaml`:
- Around line 73-82: The OTLP metrics property path is incorrect and tracing
sampling is hard-coded; change the metrics config key from
management.otlp.metrics.endpoint to management.otlp.metrics.export.url (per
Spring Boot/Micrometer OTLP) and externalize the tracing sampling probability
(management.tracing.sampling.probability) by reading it from an environment
variable or profile-specific property (e.g.,
${TRACING_SAMPLING_PROBABILITY:0.1}) or a non-dev profile override so production
does not trace 100% of requests.

In `@cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java`:
- Around line 47-52: The static Counter fields loginCounter, heartbeatCounter,
and messagePushedCounter in OtelConfig are written during Spring initialization
and later read from Netty event-loop threads, so add the volatile modifier to
each static field declaration to ensure visibility across threads (Lombok's
`@Getter` can stay as-is); update the declarations of loginCounter,
heartbeatCounter, and messagePushedCounter to be private static volatile Counter
... so Netty threads reliably see the initialized non-null counters.

In
`@cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java`:
- Around line 93-101: The empty if block after creating Span (Span span in
CIMServerHandle) should actually make the span current: call span.makeCurrent()
and assign to a Scope (e.g., Scope scope) when span != null, and ensure the
Scope is closed and span.end() in the finally block; alternatively remove the
misleading if block—prefer implementing Scope scope = span.makeCurrent() inside
the try and closing scope.close() (and ending span) in the finally to ensure
proper context propagation.

In `@cim-server/src/main/resources/application.yaml`:
- Around line 40-52: The YAML uses an unrecognized property for OTLP metrics:
replace management.otlp.metrics.endpoint with the correct nested property
management.otlp.metrics.export.url so that metrics export is configured as
management -> otlp -> metrics -> export -> url (keep
management.otlp.tracing.endpoint as-is); update the application.yaml to reflect
this property name change so Spring Boot 3.x will actually export metrics to the
OTLP collector.

In `@docker/allin1-ubuntu.Dockerfile`:
- Around line 45-46: The EXPOSE directive currently lists only Jaeger/OTel ports
(16686, 4317, 4318) but omits Zookeeper and Redis; update the EXPOSE instruction
in the Dockerfile (modify the EXPOSE line) to include ports 2181 and 6379 so it
reads to expose 16686 4317 4318 2181 6379, and verify whether port 8083 should
be added based on the application before including it.
- Around line 13-18: The Dockerfile currently hardcodes the Jaeger artifact
"jaeger-1.57.0-linux-amd64.tar.gz" and will fail for non-amd64 builds; change
the step to accept a build ARG TARGETARCH, map TARGETARCH values to the correct
Jaeger archive arch string (e.g., use "amd64" for TARGETARCH=amd64 and "arm64"
for TARGETARCH=arm64, and fail/skip with a clear message for unsupported
TARGETARCH like linux/arm/v7), then use that mapped arch when composing the
download filename and extracting the "jaeger-all-in-one" binary so the RUN
wget/tar/mv/rm sequence fetches the correct platform-specific tarball.

In `@docker/supervisord.conf`:
- Around line 34-38: The supervisord [program:server] startup sequence makes
Jaeger/OTLP a hard dependency by calling /wait-for-it.sh on port 4317 before
launching java -jar /cim-server.jar; update the command in the supervisord
[program:server] (and any analogous [program:cim-route]) to remove the
/wait-for-it.sh -t 30 localhost:4317 check (or if you must keep it, increase the
timeout significantly and add a clear comment documenting that Jaeger is
required), leaving only the Redis (6379) and Zookeeper (2181) waits so the OTLP
exporter can retry/buffer without blocking application start.

In `@pom.xml`:
- Around line 26-28: The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).

In `@README.md`:
- Line 25: The checklist item "Integrate OpenTelemetry ." has an extra space
before the period; update that checklist line (the "- [x] Integrate
OpenTelemetry ." entry) to remove the extra space so it reads "Integrate
OpenTelemetry." to correct the punctuation.
🧹 Nitpick comments (6)
docker/supervisord.conf (1)

24-32: Nit: Redundant OTLP enablement — both CLI flag and environment variable are set.

Line 25 passes --collector.otlp.enabled=true and Line 26 sets the equivalent COLLECTOR_OTLP_ENABLED="true" env var. One of them is sufficient.

cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (2)

115-121: tracerProvider.shutdown() is fire-and-forget — pending spans may be lost.

SdkTracerProvider.shutdown() returns a CompletableResultCode. Without awaiting its completion, in-flight spans in the BatchSpanProcessor may not be flushed before the JVM exits.

Proposed fix: await shutdown completion
     `@PreDestroy`
     public void shutdown() {
         if (tracerProvider != null) {
-            tracerProvider.shutdown();
+            tracerProvider.shutdown().join(10, java.util.concurrent.TimeUnit.SECONDS);
             log.info("OpenTelemetry TracerProvider shut down");
         }
     }

1-121: Significant code duplication with cim-server's OtelConfig.

The OTLP SDK initialization, tracer bean creation, and shutdown logic are nearly identical between this class and cim-server/…/OtelConfig.java. Consider extracting a shared base configuration (e.g., in cim-common) parameterized by service.name, with module-specific subclasses or configs only defining their custom metrics.

cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java (1)

116-141: @Bean returning a dummy String to trigger metric registration is a code smell.

Returning "metricsRegistered" registers an unrelated String bean in the context and obscures the intent. A @PostConstruct method on this @Configuration class (with MeterRegistry injected as a field) or a SmartInitializingSingleton would be more idiomatic.

Suggested alternative
+    `@Resource`
+    private MeterRegistry meterRegistry;
+
-    `@Bean`
-    public String registerMetrics(MeterRegistry meterRegistry) {
+    `@PostConstruct`
+    public void registerMetrics() {
         Gauge.builder("cim.server.online.users",
                 () -> SessionSocketHolder.getRelationShip().size())
                 .description("Current number of online users connected to this server")
                 .register(meterRegistry);

         loginCounter = Counter.builder("cim.server.login.count")
                 .description("Total number of login requests processed")
                 .register(meterRegistry);

         heartbeatCounter = Counter.builder("cim.server.heartbeat.count")
                 .description("Total number of heartbeat (PING) messages received")
                 .register(meterRegistry);

         messagePushedCounter = Counter.builder("cim.server.messages.pushed")
                 .description("Total number of messages pushed to clients")
                 .register(meterRegistry);

         log.info("CIM server custom metrics registered");
-        return "metricsRegistered";
     }
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)

41-47: Per-request SpringBeanFactory.getBean() lookup for the Tracer.

getTracer() performs a Spring context lookup on every LOGIN_REQUEST. While singletons are cached, the lookup still traverses the bean factory. Cache the result in a volatile field after the first successful retrieval.

Proposed fix
+    private volatile Tracer cachedTracer;
+
     private Tracer getTracer() {
+        if (cachedTracer != null) {
+            return cachedTracer;
+        }
         try {
-            return SpringBeanFactory.getBean(Tracer.class);
+            cachedTracer = SpringBeanFactory.getBean(Tracer.class);
+            return cachedTracer;
         } catch (Exception e) {
             return null;
         }
     }
cim-server/src/main/java/com/crossoverjie/cim/server/server/CIMServer.java (1)

85-139: Span ends before the async writeAndFlush completes.

span.end() in finally (line 137) executes immediately after socketChannel.writeAndFlush(protocol) is called, but the actual network write hasn't completed yet — the ChannelFutureListener fires asynchronously later. This means the span duration does not reflect actual write latency or failures. If accurate push timing matters, consider ending the span inside the listener callback instead.

Comment on lines 235 to 238
if (status != StatusEnum.SUCCESS) {
span.setAttribute("loginStatus", "failed");
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Login early-return leaves span status as UNSET.

When status != SUCCESS, the method returns early at line 237. The span is ended in finally, but its status is never explicitly set (only the attribute loginStatus=failed is recorded). In Jaeger, this span will appear without a clear OK/ERROR status, making it harder to filter and alert on failed logins.

Proposed fix
             if (status != StatusEnum.SUCCESS) {
                 span.setAttribute("loginStatus", "failed");
+                span.setStatus(StatusCode.OK); // request itself succeeded; login was rejected by business logic
                 return res;
             }
🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 235 - 238, The early return when status != StatusEnum.SUCCESS sets
span attribute "loginStatus" but leaves the span status UNSET; before returning
(in the failure branch in RouteController where you set
span.setAttribute("loginStatus", "failed")), also set the span status to an
error (e.g., span.setStatus(...) with an ERROR/StatusCode.ERROR and optional
message) so the finally block that ends the span records a non-OK status and
failed logins are visible in Jaeger.

Comment on lines 13 to 18
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
&& mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-amd64*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check Jaeger v1.57.0 available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
  jq -r '.assets[].name' | grep -i 'linux'

Repository: crossoverJie/cim

Length of output: 505


🏁 Script executed:

#!/bin/bash
# Get more details on Jaeger release to understand available architectures
curl -sL https://api.github.com/repos/jaegertracing/jaeger/releases/tags/v1.57.0 | \
  jq -r '.assets[] | "\(.name)"' | sort

Repository: crossoverJie/cim

Length of output: 965


🏁 Script executed:

#!/bin/bash
# Find and examine the Dockerfile to understand context
fd -t f -name "*allin1-ubuntu.Dockerfile" -o -name "*.Dockerfile" | head -20

Repository: crossoverJie/cim

Length of output: 290


🏁 Script executed:

#!/bin/bash
# Check for README to verify platform claims
fd -t f -name "README*" | head -5

Repository: crossoverJie/cim

Length of output: 290


🏁 Script executed:

#!/bin/bash
cat docker/allin1-ubuntu.Dockerfile | head -50

Repository: crossoverJie/cim

Length of output: 1749


🏁 Script executed:

#!/bin/bash
find . -name README.md -o -name README.txt | xargs cat 2>/dev/null | head -100

Repository: crossoverJie/cim

Length of output: 3877


🏁 Script executed:

#!/bin/bash
# Check if there's a Dockerfile.build context or multi-arch setup mentioned
find . -name Dockerfile -o -name "*.Dockerfile" | xargs ls -la

Repository: crossoverJie/cim

Length of output: 225


🏁 Script executed:

#!/bin/bash
# Look for docker build scripts, CI/CD configs, or hints about multi-arch builds
find . -name "*.yml" -o -name "*.yaml" -o -name "Makefile" -o -name "*.sh" | xargs grep -l "docker\|buildx\|arm64\|amd64" 2>/dev/null | head -10

Repository: crossoverJie/cim

Length of output: 90


🏁 Script executed:

#!/bin/bash
# Check if there's any reference to multi-arch or arm in docker-related configs
git ls-files | grep -E "(docker|build|workflow|\.github)" | head -20

Repository: crossoverJie/cim

Length of output: 337


🏁 Script executed:

#!/bin/bash
cat .github/workflows/docker.yml

Repository: crossoverJie/cim

Length of output: 1387


Jaeger binary is hardcoded to linux-amd64, breaking multi-arch CI/CD builds.

The CI/CD workflow explicitly builds for linux/amd64, linux/arm64, and linux/arm/v7, but this Dockerfile step downloads only the amd64 binary, causing arm64 builds to fail. Use TARGETARCH build arg to select the correct architecture:

🐛 Proposed fix
+ARG TARGETARCH
 # install Jaeger all-in-one (OpenTelemetry trace backend)
 # Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
-RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
-    && tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
-    && mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
-    && rm -rf jaeger-1.57.0-linux-amd64*
+RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
+    && wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+    && tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
+    && mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
+    && rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*

Note: Jaeger v1.57.0 provides only amd64 and arm64 binaries. The linux/arm/v7 platform in the CI/CD workflow cannot be satisfied with this version.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-amd64.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-amd64.tar.gz \
&& mv jaeger-1.57.0-linux-amd64/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-amd64*
ARG TARGETARCH
# install Jaeger all-in-one (OpenTelemetry trace backend)
# Jaeger UI: 16686, OTLP gRPC: 4317, OTLP HTTP: 4318
RUN JAEGER_ARCH=$(case "${TARGETARCH}" in arm64) echo "arm64";; *) echo "amd64";; esac) \
&& wget -q https://github.com/jaegertracing/jaeger/releases/download/v1.57.0/jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
&& tar -xzf jaeger-1.57.0-linux-${JAEGER_ARCH}.tar.gz \
&& mv jaeger-1.57.0-linux-${JAEGER_ARCH}/jaeger-all-in-one /usr/local/bin/ \
&& rm -rf jaeger-1.57.0-linux-${JAEGER_ARCH}*
🤖 Prompt for AI Agents
In `@docker/allin1-ubuntu.Dockerfile` around lines 13 - 18, The Dockerfile
currently hardcodes the Jaeger artifact "jaeger-1.57.0-linux-amd64.tar.gz" and
will fail for non-amd64 builds; change the step to accept a build ARG
TARGETARCH, map TARGETARCH values to the correct Jaeger archive arch string
(e.g., use "amd64" for TARGETARCH=amd64 and "arm64" for TARGETARCH=arm64, and
fail/skip with a clear message for unsupported TARGETARCH like linux/arm/v7),
then use that mapped arch when composing the download filename and extracting
the "jaeger-all-in-one" binary so the RUN wget/tar/mv/rm sequence fetches the
correct platform-specific tarball.

pom.xml Outdated
Comment on lines 26 to 28
<micrometer-tracing.version>1.3.0</micrometer-tracing.version>
<micrometer.version>1.13.0</micrometer.version>
<opentelemetry.version>1.38.0</opentelemetry.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the pom.xml file
find . -maxdepth 2 -name "pom.xml" -type f | head -5

Repository: crossoverJie/cim

Length of output: 185


🌐 Web query:

Spring Boot 3.3.0 micrometer version managed dependency BOM

💡 Result:

Spring Boot 3.3.0’s managed dependency BOM (org.springframework.boot:spring-boot-dependencies:3.3.0) aligns with:

  • Micrometer (metrics): 1.13.0 (i.e., io.micrometer:micrometer-*) [1][2]
  • Micrometer Tracing: 1.3.0 (i.e., io.micrometer:micrometer-tracing-*) [1]

Sources: [1] [2]


🏁 Script executed:

# Examine the root pom.xml file to see parent version and context
cat -n pom.xml | head -60

Repository: crossoverJie/cim

Length of output: 2860


🌐 Web query:

Spring Boot 3.3.0 opentelemetry version managed dependency

💡 Result:

Spring Boot 3.3.0 (via org.springframework.boot:spring-boot-dependencies:3.3.0) manages OpenTelemetry 1.37.x — i.e. the managed OpenTelemetry BOM line is io.opentelemetry:opentelemetry-bom:1.37.0 / opentelemetry.version 1.37.0. [1], [2]

Sources: [1], [2]


Remove redundant Micrometer/Micrometer Tracing version properties; align OpenTelemetry version with Spring Boot 3.3.0.

Spring Boot 3.3.0 already manages micrometer-core 1.13.0 and micrometer-tracing 1.3.0—the versions declared here match exactly and are redundant. Removing these properties will prevent version drift.

However, Spring Boot 3.3.0 manages OpenTelemetry 1.37.0, while this pom.xml declares 1.38.0. Verify whether the newer version is intentional; if not, align with Spring Boot's managed version to maintain consistency and avoid potential incompatibilities in the classpath.

🤖 Prompt for AI Agents
In `@pom.xml` around lines 26 - 28, The POM declares redundant properties
micrometer-tracing.version and micrometer.version (matching Spring Boot 3.3.0
managed versions) and a mismatched opentelemetry.version; remove the
micrometer-tracing.version and micrometer.version properties from the pom.xml so
Spring Boot's dependency management controls those versions, and either change
opentelemetry.version to 1.37.0 to match Spring Boot 3.3.0 or remove the
opentelemetry.version property if you intend to use the Boot-managed version
instead (check for any intentional override before removal).

- Use dedicated gRPC endpoint property (otel.exporter.otlp.grpc.endpoint)
- Add span.makeCurrent() with Scope for proper context propagation
- Add span.recordException() in p2pRoute catch block
- Set span status on login failure early return
- Fix OTLP metrics property (management.otlp.metrics.export.url)
- Externalize tracing sampling probability via env var
- Use volatile for static counter fields in cim-server OtelConfig
- Use @PostConstruct instead of @bean returning String for metrics
- Await TracerProvider shutdown completion with join()
- Cache Tracer with volatile field in CIMServerHandle
- Support multi-arch Docker builds with TARGETARCH
- Expose Zookeeper/Redis ports in Dockerfile
- Remove Jaeger as hard startup dependency in supervisord
- Remove redundant micrometer version properties from POM
- Update OpenTelemetry version to 1.40.0
- Fix README punctuation
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`:
- Around line 155-196: p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.

In `@cim-forward-route/src/main/resources/application.yaml`:
- Line 95: Remove the trailing blank line at the end of application.yaml so the
file ends immediately after the last YAML content; open application.yaml, delete
the extra empty newline/whitespace after the final entry, save, and re-run
yamllint to confirm no EOF blank-line warning remains.
- Around line 73-94: The YAML and manual SDK both configure tracing causing
spans to be exported to different endpoints; consolidate to one approach by
either removing the custom OtelConfig and its
OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so all spans use Spring
Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.
🧹 Nitpick comments (3)
cim-server/src/main/java/com/crossoverjie/cim/server/handle/CIMServerHandle.java (1)

38-54: Lazy Tracer initialization is fine for this non-Spring-managed handler, but consider a small robustness improvement.

The double-read of cachedTracer without synchronization is benign (worst case: redundant getBean calls). However, the catch (Exception e) silently swallows all exceptions and returns null, which means configuration errors (e.g., missing bean definition) will be hidden. Consider logging at debug/warn level on first failure so misconfigurations are detectable.

Proposed improvement
         try {
             cachedTracer = SpringBeanFactory.getBean(Tracer.class);
             return cachedTracer;
         } catch (Exception e) {
+            log.debug("Tracer bean not available, tracing disabled: {}", e.getMessage());
             return null;
         }
cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java (1)

36-37: Unused meterRegistry field injection.

The meterRegistry field is injected via @Resource but never referenced. The counter @Bean methods receive MeterRegistry as a method parameter (Spring auto-injects it), so this field is dead code.

Proposed fix
-    `@jakarta.annotation.Resource`
-    private MeterRegistry meterRegistry;
-
cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java (1)

285-301: Consider adding span tracing to registerAccount for consistency.

All other mutating endpoints (groupRoute, p2pRoute, login) are instrumented with both spans and counters, but registerAccount only has a counter increment. If a registration fails, there's no trace to diagnose it. This is a minor consistency gap.

Comment on lines 155 to 196
public BaseResponse<NULLBody> p2pRoute(@RequestBody P2PReqVO p2pRequest) {
BaseResponse<NULLBody> res = new BaseResponse();

try {
//获取接收消息用户的路由信息
Optional<CIMServerResVO> cimServerResVO = accountService.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId());
// === OpenTelemetry: 创建私聊路由 Span ===
Span span = tracer.spanBuilder("cim.route.p2pMsg").startSpan();

try (Scope scope = span.makeCurrent()) {
// 获取接收消息用户的路由信息
Optional<CIMServerResVO> cimServerResVO = accountService
.loadRouteRelatedByUserId(p2pRequest.getReceiveUserId());
if (cimServerResVO.isEmpty()) {
log.warn("userId={} not online, save offline msg", p2pRequest.getReceiveUserId());
offlineMsgService.saveOfflineMsg(p2pRequest);
throw new CIMException(OFF_LINE);
}

//p2pRequest.getReceiveUserId()==>消息接收者的 userID
ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(), p2pRequest.getBatchMsg());
// p2pRequest.getReceiveUserId()==>消息接收者的 userID
ChatReqVO chatVO = new ChatReqVO(p2pRequest.getReceiveUserId(), p2pRequest.getMsg(),
p2pRequest.getBatchMsg());
accountService.pushMsg(cimServerResVO.get(), p2pRequest.getUserId(), chatVO);

// 记录 Span 属性
span.setAttribute("senderId", p2pRequest.getUserId());
span.setAttribute("receiverId", p2pRequest.getReceiveUserId());
span.setStatus(StatusCode.OK);

// === Metrics: 递增私聊计数器 ===
p2pMessageCounter.increment();

res.setCode(StatusEnum.SUCCESS.getCode());
res.setMessage(StatusEnum.SUCCESS.getMessage());

} catch (CIMException e) {
span.setStatus(StatusCode.ERROR, e.getErrorMessage());
span.recordException(e);
res.setCode(e.getErrorCode());
res.setMessage(e.getErrorMessage());
} finally {
span.end();
}
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

p2pRoute only catches CIMException — non-CIM exceptions leave span status UNSET.

If a non-CIMException (e.g., NullPointerException, network error) is thrown, it bypasses the catch (CIMException e) block and goes directly to finally, where span.end() is called but the span status remains UNSET and the exception is not recorded. This contrasts with groupRoute and login, which catch all Exception types.

Proposed fix — add a general catch block
         } catch (CIMException e) {
             span.setStatus(StatusCode.ERROR, e.getErrorMessage());
             span.recordException(e);
             res.setCode(e.getErrorCode());
             res.setMessage(e.getErrorMessage());
+        } catch (Exception e) {
+            span.setStatus(StatusCode.ERROR, e.getMessage());
+            span.recordException(e);
+            throw e;
         } finally {
             span.end();
         }
🤖 Prompt for AI Agents
In
`@cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java`
around lines 155 - 196, p2pRoute currently only catches CIMException so non-CIM
exceptions leave the OpenTelemetry Span with StatusCode.UNSET and unrecorded;
add a general catch (Exception e) after the existing catch to set
span.setStatus(StatusCode.ERROR, e.getMessage()), span.recordException(e), and
populate the response with a generic error code/message (consistent with other
handlers like groupRoute/login), ensuring the catch references the existing Span
variable `span`, the method `p2pRoute`, and the response `res` to mirror the
error handling pattern used elsewhere.

Comment on lines +73 to +94
# OpenTelemetry 配置
management:
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
metrics:
export:
url: http://localhost:4318/v1/metrics
tracing:
sampling:
probability: ${TRACING_SAMPLING_PROBABILITY:1.0}
endpoints:
web:
exposure:
include: health,info,metrics

# OpenTelemetry gRPC endpoint for SDK exporter
otel:
exporter:
otlp:
grpc:
endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine OtelConfig.java
fd -type f -name "OtelConfig.java" -exec cat -n {} \;
echo "---"
# Check the application.yaml file to confirm configuration
fd -type f -name "application.yaml" -path "*/cim-forward-route/*" -exec cat -n {} \;

Repository: crossoverJie/cim

Length of output: 422


🏁 Script executed:

#!/bin/bash
# Find OtelConfig.java anywhere in the repo
find . -name "OtelConfig.java" -type f
echo "---"
# Also search for files containing OpenTelemetrySdk
rg -l "OpenTelemetrySdk" --type java
echo "---"
# Check for application.yaml in the cim-forward-route module
find . -path "*/cim-forward-route/*/application.yaml" -type f
echo "---"
# Look for pom.xml in cim-forward-route to check dependencies
find . -path "*/cim-forward-route/pom.xml" -type f

Repository: crossoverJie/cim

Length of output: 531


🏁 Script executed:

#!/bin/bash
# Examine OtelConfig.java in cim-forward-route
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/config/OtelConfig.java

Repository: crossoverJie/cim

Length of output: 4419


🏁 Script executed:

#!/bin/bash
# Check pom.xml for Spring Boot version and OpenTelemetry dependencies
cat -n cim-forward-route/pom.xml

Repository: crossoverJie/cim

Length of output: 6444


🏁 Script executed:

#!/bin/bash
# Read the application.yaml file for cim-forward-route
cat -n cim-forward-route/src/main/resources/application.yaml
echo "---"
# Check if there are any autoconfiguration exclusions
rg -n "spring.autoconfigure.exclude" cim-forward-route/

Repository: crossoverJie/cim

Length of output: 2881


🏁 Script executed:

#!/bin/bash
# Check if GlobalOpenTelemetry.set() is called
rg -n "GlobalOpenTelemetry.set" --type java
echo "---"
# Check parent pom.xml for Spring Boot version
cat cim/pom.xml | head -50

Repository: crossoverJie/cim

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Find the parent pom.xml
find . -name "pom.xml" -path "*/cim/*" -type f | head -5
echo "---"
# Check the root pom.xml
ls -la pom.xml 2>/dev/null && cat -n pom.xml | head -100

Repository: crossoverJie/cim

Length of output: 4736


🏁 Script executed:

#!/bin/bash
# Check for any spring.autoconfigure.exclude in the parent or module poms
rg -n "spring.autoconfigure.exclude" --type xml
echo "---"
# Check for how the Tracer bean is being used in the codebase
rg -n "private.*Tracer\|@Autowired.*Tracer\|@Resource.*Tracer" --type java cim-forward-route/src/main/java/
echo "---"
# Check if spring-boot-starter-actuator triggers auto-config
rg -n "spring.boot.actuate" --type java cim-forward-route/src/main/java/ | head -20

Repository: crossoverJie/cim

Length of output: 67


🏁 Script executed:

#!/bin/bash
# Search for any usage of the Tracer bean in cim-forward-route
rg -n "tracer\|Tracer" --type java cim-forward-route/src/main/java/ | grep -v "TracerProvider\|OpenTelemetry" | head -30
echo "---"
# Check if there are any span creations
rg -n "startSpan\|startActiveSpan" --type java cim-forward-route/src/main/java/

Repository: crossoverJie/cim

Length of output: 63


🏁 Script executed:

#!/bin/bash
# Check cim-server's OtelConfig to see if it follows the same pattern
cat -n cim-server/src/main/java/com/crossoverjie/cim/server/config/OtelConfig.java | head -70
echo "---"
# Check if cim-server's application.yaml has similar configuration
cat -n cim-server/src/main/resources/application.yaml | grep -A 25 "OpenTelemetry"

Repository: crossoverJie/cim

Length of output: 3954


🏁 Script executed:

#!/bin/bash
# Check if cim-server uses the Tracer bean
rg -n "Tracer" cim-server/src/main/java/ --type java | grep -v "TracerProvider\|@Bean" | head -20
echo "---"
# Check how GlobalOpenTelemetry is used or if it's set anywhere
rg -n "GlobalOpenTelemetry\|setGlobal" --type java

Repository: crossoverJie/cim

Length of output: 1948


🏁 Script executed:

#!/bin/bash
# Check if cim-forward-route's RouteHandler or MessageHandler uses Tracer
rg -n "import.*Tracer|new.*Span|startSpan|startActiveSpan" cim-forward-route/src/main/java/ --type java
echo "---"
# Check for Tracer usage in cim-forward-route via SpringBeanFactory or direct injection
rg -n "@Autowired.*Tracer|@Resource.*Tracer|getBean.*Tracer" cim-forward-route/src/main/java/ --type java
echo "---"
# List main classes in cim-forward-route to check if Tracer is injected
ls -la cim-forward-route/src/main/java/com/crossoverjie/cim/route/*/

Repository: crossoverJie/cim

Length of output: 3642


🏁 Script executed:

#!/bin/bash
# Check how tracer is injected in RouteController
cat -n cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java | head -50

Repository: crossoverJie/cim

Length of output: 2603


🏁 Script executed:

#!/bin/bash
# Check the field declarations in RouteController
sed -n '40,120p' cim-forward-route/src/main/java/com/crossoverjie/cim/route/controller/RouteController.java

Repository: crossoverJie/cim

Length of output: 2486


Dual tracing setup exports spans to different endpoints — consolidate to single configuration.

The application configures two separate OpenTelemetry tracing paths:

  1. Spring Boot auto-configuration (via micrometer-tracing-bridge-otel on classpath + management.otlp.tracing.endpoint in YAML) auto-instruments HTTP requests and exports spans via HTTP to http://localhost:4318/v1/traces.

  2. Manual SDK in OtelConfig creates a separate OpenTelemetrySdk with SdkTracerProvider and OtlpGrpcSpanExporter, exporting spans via gRPC to http://localhost:4317 (configured by otel.exporter.otlp.grpc.endpoint).

RouteController explicitly injects and uses the manual Tracer bean to create spans for group messaging, P2P messaging, and login operations. This means spans from these operations use the manual gRPC path (port 4317), while Spring Boot's auto-instrumented request/response tracing uses the HTTP path (port 4318). If only one collector is running, you'll lose traces; if both are running, traces are split across endpoints and difficult to correlate.

Choose one approach:

  • Option A: Rely solely on Spring Boot's auto-configuration by removing OtelConfig and letting Spring Boot handle all tracing.
  • Option B: Disable Spring Boot's auto-configuration (spring.autoconfigure.exclude) and keep only OtelConfig for manual control.
🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` around lines 73 - 94,
The YAML and manual SDK both configure tracing causing spans to be exported to
different endpoints; consolidate to one approach by either removing the custom
OtelConfig and its OpenTelemetrySdk/SdkTracerProvider/OtlpGrpcSpanExporter so
all spans use Spring Boot auto-config (management.otlp.tracing.endpoint ->
http://localhost:4318/v1/traces) and inject the auto-configured Tracer into
RouteController, or disable Spring Boot tracing auto-config (via
spring.autoconfigure.exclude) and keep OtelConfig
(otel.exporter.otlp.grpc.endpoint -> http://localhost:4317) as the sole
tracer—update RouteController to use whichever Tracer bean remains and remove
the unused tracing configuration to avoid dual export.

otlp:
grpc:
endpoint: ${OTEL_EXPORTER_OTLP_GRPC_ENDPOINT:http://localhost:4317}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Trailing blank line at end of file.

YAMLlint flags an extra blank line at the end. Remove it to satisfy the linter.

🧰 Tools
🪛 YAMLlint (1.38.0)

[warning] 95-95: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents
In `@cim-forward-route/src/main/resources/application.yaml` at line 95, Remove the
trailing blank line at the end of application.yaml so the file ends immediately
after the last YAML content; open application.yaml, delete the extra empty
newline/whitespace after the final entry, save, and re-run yamllint to confirm
no EOF blank-line warning remains.

@crossoverJie
Copy link
Owner

Sorry for the late reply, please check the CI.

Spring Boot BOM does not manage micrometer-tracing-bridge-otel and
micrometer-registry-otlp versions. Restored version properties with
updated versions (micrometer-tracing 1.4.2, micrometer 1.14.3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants