-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Problem Description:
We have a setup: Client -> Spring Cloud Gateway -> Backend Service (host: 122). There is only one upstream service, and all requests are routed to host 122.
Starting from 2025-10-16 11:01, all client requests began experiencing 504 Timeouts. The client's timeout is set to 30 seconds, while the gateway and backend service timeouts are configured for 300 seconds.
Then, from 2025-10-16 13:47, all client requests started receiving 500 Errors, indicating the gateway failed to forward the requests. The exception stack trace is as follows:
reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException: Pool#acquire(Duration) has been pending for more than the configured timeout of 1000ms
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:418)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ com.alibaba.csp.sentinel.adapter.spring.webflux.SentinelWebFluxFilter [DefaultWebFilterChain]
*__checkpoint ⇢ org.springframework.web.filter.reactive.ServerHttpObservationFilter [DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/life/xxx" [ExceptionHandlingWebHandler]
Original Stack Trace:
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.run(AbstractPool.java:418)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Our Investigation:
1.Gateway Configuration:
properties
spring.cloud.gateway.httpclient.pool.type=FIXED
spring.cloud.gateway.httpclient.pool.acquire-timeout=1000
spring.cloud.gateway.httpclient.pool.max-connections=10000
spring.cloud.gateway.httpclient.pool.max-idle-time=10000
spring.cloud.gateway.httpclient.response-timeout=PT300S
spring.cloud.gateway.httpclient.connect-timeout=5000
spring.codec.max-in-memory-size=5MB
2.The metric http_client_requests_active_seconds_active_count started rising from 11:01, reached the maximum configured value of 10000 at 13:47, and then the 500 errors (request forwarding failures) began occurring.
3.Analysis of TCP packets shows that the gateway sent its last packet at 11:05. The backend service (122) then sent an RST packet at 11:10 (exactly 300 seconds later). After this RST, the gateway did not send any further TCP packets to host 122, even though clients continued to send requests (which all timed out).
4.We have an internal HTTP endpoint exposed via Spring Web in the gateway for operational tasks (e.g., cache updates by DevOps). Around the time the issue started (approximately 11:02), we observed warnings related to a potential SkyWalking version incompatibility when calling this internal endpoint. However, we haven't conclusively identified this as the cause of a connection leak. The timing correlation is notable.
Relevant Log Snippet (2025-10-16 11:02:30.898):
[reactor-http-epoll-2] [WARN ] ... java.lang.NoSuchMethodError: 'org.springframework.http.HttpStatus org.springframework.http.server.reactive.ServerHttpResponse.getStatusCode()'
at org.apache.skywalking.apm.plugin.spring.mvc.v5.InvokeInterceptor.lambda$afterMethod$0(InvokeInterceptor.java:73)
... (full stack trace provided in the original description)
Request for Assistance:
Our investigation is currently stalled. We would appreciate help from the community in identifying the root cause of this issue. Any guidance on potential causes or further troubleshooting steps would be greatly appreciated.