The Polling System CPU overhead #4328

DeviLab · 2025-03-28T09:31:34Z

The new polling system looks interesting for the cases when there are libraries which use it, but when I updated my app to CE 3.6, I faced a significant CPU increase (more than 70%). According to the profiler it is caused by EPoll-related calls from the Cats Effect code. When I overrode the polling system to SleepSystem in the main class the problem has gone and the CPU consumption was back top normal:

import cats.effect.unsafe.{PollingSystem, SleepSystem}
....
object Main extends IOApp {
    override protected def pollingSystem: PollingSystem = SleepSystem
    .......
}

Thare are still a lot of apps and libraries which use the polling mechanisms directly and can't be re-implemented to use the new feature. It means that users can see some performance degradations just because they bumped Cats Effect to 3.6. Do you have any ideas how this issue can be addressed? Does every app which doesn't rely on the polling system require such explicit change?

The text was updated successfully, but these errors were encountered:

djspiewak · 2025-03-28T14:39:27Z

I would consider this to be a bug.

Is this on JVM or Native? I would assume JVM? We do have the capability to gate polling based on whether any events have actually been registered by higher level code, but we don't actually do that yet. That is likely the solution here. In the meantime, the workaround (explicitly swapping to SleepSystem) feels correct, albeit annoying.

djspiewak · 2025-03-28T14:46:38Z

Actually correction: we did implement the event gating for the SelectorSystem, though only for the hot path variant (not the parking). Do you have profiling results which indicate which call is bearing all the load? In particular, are you seeing increased overhead in parkLoop or directly in run (in WorkerThread)? Also is it at all possible that parts of your app are using fs2-io but other parts aren't? So then you could be in a situation where you're using the polling system for a few small things but not the major stuff.

DeviLab · 2025-03-28T15:26:13Z

Yes, it's JVM. My profiling results points at various methods which call some EPoll API and consume more CPU than in 3.5, but nothing related to parkLoop. For example, WorkerThread.parkUntilNextSleeper calls EPollSelectorImpl.processEvents and consumes extra CPU:

3.6

3.5

We do use fs2-io as a part of http4s-ember-client, but it's an old version (3.11), which doesn't support the polling system yet.

djspiewak · 2025-03-28T15:33:23Z

Yeah it's definitely the park loop. Need to think about this a bit. The tricky thing here is that waking up a thread becomes an exercise in statefulness if we make this bit contingent on outstanding events. I think that's definitely the right thing to do though.

iRevive · 2025-03-28T15:37:42Z

Could you check poller/worker-thread metrics by any chance?

They are available via MBean out of the box.

armanbilge · 2025-03-29T00:26:28Z

@DeviLab sorry if I missed it, what is your Linux version and your JVM version?

DeviLab · 2025-03-31T08:27:58Z

bash-5.1$ uname -a
Linux preprod-***-7ddb768f67-kp9xt 5.14.0-427.37.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep 24 08:06:42 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
bash-5.1$ java -version
java version "21.0.6" 2025-01-21 LTS
Java(TM) SE Runtime Environment (build 21.0.6+8-LTS-188)
Java HotSpot(TM) 64-Bit Server VM (build 21.0.6+8-LTS-188, mixed mode, sharing)

DeviLab · 2025-03-31T13:56:22Z

@iRevive, I can't check it via MBean, but check it directly in IORuntime: all poller metrics are 0.

iRevive · 2025-04-02T11:44:59Z

I've observed an increased CPU usage, too.

Service A. Has traffic

The CE 3.6 version was deployed at 09:05.

CPU usage

Service B. Nearly zero traffic

The CE 3.6 version was deployed at 12:50.

CPU usage

Worker thread

Local queue

Timer heap

The following change brought CPU usage back to normal:

override protected def pollingSystem: PollingSystem = SleepSystem

djspiewak · 2025-04-02T12:11:47Z

@iRevive First off, you have lovely dashboards. Second, am I reading this right that when you deployed 3.6, suddenly a ton of actual work materialized on the work queues? Something is really really odd then. I can't imagine what work could have just come out of no where; this is different from the selector overhead.

I wonder if it's worth grabbing a fiber dump just to eyeball what's even going on during the idle time?

iRevive · 2025-04-02T12:39:51Z

Second, am I reading this right that when you deployed 3.6, suddenly a ton of actual work materialized on the work queues?

We weren't using IO Runtime metrics before 3.6. But I will try to grab a fiber dump.

djspiewak · 2025-04-03T13:34:05Z

Had a chat with @armanbilge last night and I'm fairly certain the optimal way to resolve this is by conditionally parking using more efficient methods and then dealing with the stateful race condition on the interrupter side. It's very tricky and adds meaningful complexity but it's not at all impossible, and it should mean that applications which don't touch the PollingSystem end up seeing zero overhead.

djspiewak · 2025-04-14T01:54:49Z

@DeviLab Would you be able to test with 3.6.1-25-d807be0 by any chance? You can remove your override def pollingSystem workaround. You should see results (with this snapshot) that are very close to 3.5.x, though I wouldn't be surprised if there's a slight step back relative to that baseline.

DeviLab · 2025-04-14T11:31:07Z

I tested your fix. Initially I saw almost no difference between the fix and vanilla 3.6 without overridden pollingSystem. Than I looked at your PR and realized that it could be so because we've updated fs2 recently to 3.12 (with polling system integration). I rolled back fs2 to 3.11 and it became working perfectly. So, the fix works, but only in cases when there the polling system is completely not used. In our case we have http4s-ember-client (which uses fs2-io under the hood) in the project, but it's not so heavily loaded: only one thread is used to call EPoll.wait (unless fs2 relies on the polling system from CE), but when I set pollingSystem to SelectorSystem, every WorkerThread begins doing so. It would be nice to have a mechanism which would make polling not so aggressive.

armanbilge · 2025-04-14T15:19:56Z

Thank you so much for testing and sharing results!

In our case we have http4s-ember-client (which uses fs2-io under the hood) in the project, but it's not so heavily loaded ... but when I set pollingSystem to SelectorSystem, every WorkerThread begins doing so. It would be nice to have a mechanism which would make polling not so aggressive.

That's interesting. In Daniel's #4377, a WorkerThread only transitions to polling (instead of ordinary sleeping) if there is outstanding I/O on that thread and it will transition back once the I/O is completed. i.e. it's already not very aggressive, because each thread decides individually and dynamically whether to use polling or not every time it sleeps.

So if all the worker threads are using polling all the time, then it's because there are lots of I/O events happening all the time.

djspiewak · 2025-04-14T16:27:34Z

Two of the core assumptions of the integrated runtime is that 1) spreading data across all CPU cores is good, and 2) context shifts are super expensive. So it doesn't actually take that many I/O events to end up in a situation where every thread needs to poll, because the alternative would have been either asymmetric load (in continuation handling) or more context shifts (bouncing to another CPU after event completion).

You can see some hints of this phenomenon in the graphs actually. I would guess that ember-client isn't really fully saturating the cores, and some of our parking is probably simple (non-polling), which is why we see some irregularity in the CPU load. One plausible theory here is that we may be hitting a bit of an uncanny valley in your application: just enough polling system use to force us to take the penalty of selector use, but just enough non-polling system I/O that the penalty might not pay for itself. I understand you're still using Blaze for the server?

I would love to understand whether the end-to-end latency and throughput metrics are regressed going from fs2 3.11 to 3.12, or if this is just something which is showing up in the CPU utilization. To be clear, we know that Selector is really inefficient. There may be some ways we can improve that, but the endgame here is to bypass it entirely and go directly to epoll, which should resolve much of this incremental overhead and make the uncanny valley a lot narrower.

djspiewak added the 🪲 bug label Mar 28, 2025

migesok mentioned this issue Apr 1, 2025

Update cats-effect to 3.6.0 evolution-gaming/kafka-journal#742

Closed

djspiewak self-assigned this Apr 13, 2025

djspiewak added this to the v3.6.0 milestone Apr 13, 2025

djspiewak linked a pull request Apr 13, 2025 that will close this issue

Dynamically bypass selector polling if no I/O events are present #4377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Polling System CPU overhead #4328

The Polling System CPU overhead #4328

DeviLab commented Mar 28, 2025

djspiewak commented Mar 28, 2025 •

edited

Loading

Uh oh!

djspiewak commented Mar 28, 2025 •

edited

Loading

Uh oh!

DeviLab commented Mar 28, 2025

Uh oh!

djspiewak commented Mar 28, 2025

Uh oh!

iRevive commented Mar 28, 2025

Uh oh!

armanbilge commented Mar 29, 2025

Uh oh!

DeviLab commented Mar 31, 2025

Uh oh!

DeviLab commented Mar 31, 2025

Uh oh!

iRevive commented Apr 2, 2025 •

edited

Loading

Uh oh!

djspiewak commented Apr 2, 2025

Uh oh!

iRevive commented Apr 2, 2025 •

edited

Loading

Uh oh!

djspiewak commented Apr 3, 2025

Uh oh!

djspiewak commented Apr 14, 2025

Uh oh!

DeviLab commented Apr 14, 2025

Uh oh!

armanbilge commented Apr 14, 2025 •

edited

Loading

Uh oh!

djspiewak commented Apr 14, 2025

Uh oh!

The Polling System CPU overhead #4328

The Polling System CPU overhead #4328

Comments

DeviLab commented Mar 28, 2025

djspiewak commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djspiewak commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DeviLab commented Mar 28, 2025

Uh oh!

djspiewak commented Mar 28, 2025

Uh oh!

iRevive commented Mar 28, 2025

Uh oh!

armanbilge commented Mar 29, 2025

Uh oh!

DeviLab commented Mar 31, 2025

Uh oh!

DeviLab commented Mar 31, 2025

Uh oh!

iRevive commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Service A. Has traffic

Service B. Nearly zero traffic

Uh oh!

djspiewak commented Apr 2, 2025

Uh oh!

iRevive commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djspiewak commented Apr 3, 2025

Uh oh!

djspiewak commented Apr 14, 2025

Uh oh!

DeviLab commented Apr 14, 2025

Uh oh!

armanbilge commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djspiewak commented Apr 14, 2025

Uh oh!

djspiewak commented Mar 28, 2025 •

edited

Loading

djspiewak commented Mar 28, 2025 •

edited

Loading

iRevive commented Apr 2, 2025 •

edited

Loading

iRevive commented Apr 2, 2025 •

edited

Loading

armanbilge commented Apr 14, 2025 •

edited

Loading