-
Notifications
You must be signed in to change notification settings - Fork 554
Dynamically bypass selector polling if no I/O events are present #4377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: series/3.6.x
Are you sure you want to change the base?
Dynamically bypass selector polling if no I/O events are present #4377
Conversation
Released as |
I'd like to see the real world numbers on this one, but the microbenchmarks seem to give their blessing in so far as they go. Before
AfterNote that the global EC is 4% slower in this run, but
|
val polling = st eq ParkedSignal.ParkedPolling | ||
val simple = st eq ParkedSignal.ParkedSimple | ||
|
||
if ((polling || simple) && signal.compareAndSet(st, ParkedSignal.Interrupting)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic here is different from the old implementation: getAndSet
involves a loop, but there is no longer a loop here.
In the (admittedly unlikely) scenario that a worker thread transitions from ParkedPolling
to ParkedSimple
or vice versa while evaluating this condition, we would prematurely give up on attempting to wake up this thread. We might consider looping if the compareAndSet
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might not be worth it. So going from ParkedPolling
to ParkedSimple
would require that between the get()
above and the CAS, the other thread would need to wake itself up (or be awakened by another notify), run through its whole state machine, pick up a fiber, execute that fiber through the whole machinery, run out of work, go through the whole "check all the things for work" (which, includes doing the thing that we're trying to notify it to do here!) and then finally go back and park again, all in the space of just these few lines.
Aside from the fact that this race condition is almost inconceivable, it effectively just results in a loss of performance since we give up on the thread and go wake another one. The work item we're trying to notify on would actually get picked up (before the target thread resuspends), and we'll ultimately wake the same number of threads.
This compared to having an actual CAS loop which would require more complexity here and I think an extra conditional jump, despite the fact that it's effectively never going to be hit and wouldn't matter even if it did. So I think I'd rather just eat the CAS failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the fact that this race condition is almost inconceivable, it effectively just results in a loss of performance since we give up on the thread and go wake another one.
Sure ... I am just worried about relying on the existence of another thread. For example, if there is exactly once worker thread in the runtime, could we end up in a deadlock?
I have thought through it a few times and I have not come up with a scenario yet that could deadlock, but I don't feel confident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's only one worker then that worker is either awake already (in which case we can't have entered this conditional) or it's parked and we're notifying it (in which case all the normal rules apply).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or it's parked and we're notifying it (in which case all the normal rules apply).
Right, so if all the normal rules are applying ...
Aside from the fact that this race condition is almost inconceivable, it effectively just results in a loss of performance since we give up on the thread and go wake another one.
What happens if we fail the CAS due to the race condition, we don't loop, and then we don't go to the next thread because there is no next thread? We've given up on our one-and-only thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You words precisely capture my feelings about this:
Now, I think you're probably right that nothing is lost here since the thread will both check the external queue and attempt to steal before going back to sleep, but it's hard for me to 100% convince myself that this is safe in all circumstances. This is very similar to the multi-consumer queue cases, which are genuinely insanely subtle and we've been bitten quite a bit with lost notifications. Speaking more generally, we tend to always bias notifications in favor of over-notifying rather than under-notifying, since the worst case in the former is a bit less performance, while the worst case in the latter is a deadlock.
Co-authored-by: Arman Bilge <[email protected]>
Co-authored-by: Arman Bilge <[email protected]>
Co-authored-by: Arman Bilge <[email protected]>
Co-authored-by: Arman Bilge <[email protected]>
So my interpretation of the results from #4328 is that this is indeed an improvement, and we likely have some more work to do on |
tests/jvm/src/test/scala/cats/effect/IOPlatformSpecification.scala
Outdated
Show resolved
Hide resolved
I'm probably missing something, but do we really need the (Btw, I've restarted the "windows-latest, 2.13.15, temurin@17, ciJVM" job, because it timed out. But I'm wondering if it was a real failure.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but do we really need the ParkedSignal.Interrupting state? What does the spinlock actually protects? (Spurious wakeups already must be handled.)
@durban My understanding is this protects us against the case where the interruptor reads that the worker thread is in the "polling" or "simple" state, but by the time it invokes the appropriate interrupt action, the worker thread is parked in an different state (and thus the interrupt is ineffective).
val polling = st eq ParkedSignal.ParkedPolling | ||
val simple = st eq ParkedSignal.ParkedSimple | ||
|
||
if ((polling || simple) && signal.compareAndSet(st, ParkedSignal.Interrupting)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the fact that this race condition is almost inconceivable, it effectively just results in a loss of performance since we give up on the thread and go wake another one.
Sure ... I am just worried about relying on the existence of another thread. For example, if there is exactly once worker thread in the runtime, could we end up in a deadlock?
I have thought through it a few times and I have not come up with a scenario yet that could deadlock, but I don't feel confident.
// just create a bit of chaos with timers and async completion | ||
val sleeps = 0.until(10).map(i => IO.sleep((i * 10).millis)).toList | ||
|
||
val latch = IO.deferred[Unit].flatMap(d => d.complete(()).start *> d.get) | ||
val latches = 0.until(10).map(_ => latch).toList | ||
|
||
val test = (sleeps ::: latches).parSequence.parReplicateA_(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a bit more chaos: specifically, exercising the external queue somehow.
Also, I wonder if we should have a version of this test where the runtime has exactly one worker, that's how we've caught various bugs before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not averse to running a version of the test with only one worker. I'll try tickling the external queue first.
Okay, so the way to wake someone up is (without this PR) essentially: if (signal.getAndSet(false)) system.interrupt(...) With the 2 different parking states it could be: val old = signal.getAndSet(Unparked)
// HERE
if (old eq ParkedPolling) system.interrupt(...)
else if (old eq ParkedSimple) LockSupport.unpark(...) And yes, it could happen that at the point marked with (Again, I might be missing something, I just started thinking about this...) |
def parkLoop(needsPoll: Boolean): Boolean = { | ||
if (done.get()) { | ||
false | ||
} else { | ||
// Park the thread until further notice. | ||
val start = System.nanoTime() | ||
metrics.incrementPolledCount() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this metric be incremented if only needPoll==true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Yes.
@durban I think we need First, when the The second reason here is the situation @armanbilge outlined: the pool decides to a awaken a thread which is parked polling, that thread wakes on its own, does its thing, goes all the way through its state machine, and then parks again without polling before the pool has the ability to wake it. In this situation, a notification ends up being "lost". Now, I think you're probably right that nothing is lost here since the thread will both check the external queue and attempt to steal before going back to sleep, but it's hard for me to 100% convince myself that this is safe in all circumstances. This is very similar to the multi-consumer queue cases, which are genuinely insanely subtle and we've been bitten quite a bit with lost notifications. Speaking more generally, we tend to always bias notifications in favor of over-notifying rather than under-notifying, since the worst case in the former is a bit less performance, while the worst case in the latter is a deadlock. I'm not saying we shouldn't try to reason through this, and if we can find a way to remove the |
I tried CPU Usage![]() Although the graph shows the mean value as 20%, the startup inflates the number. The more realistic number is ~13% when the system is idling. Events per worker![]() The sum of all events (blocked, parked, polled) across all workers. Ah, the poll number should be even lower, if I get it right: #4377 (comment). |
@djspiewak Yeah, I haven't looked at Something else: https://github.com/typelevel/cats-effect/actions/runs/14459016996/job/40548044386?pr=4377#step:22:4843 This is the second time I've seen this branch hang in |
This does feel like something real, but it's definitely kinda nuts. It also surprises me that |
With this change, every time we park a thread we check to see if the underlying polling system
needsPoll
, which is a hint that is intended to be implemented by checking if any events have been registered with the underlying polling system. If it returnsfalse
, we don't even bother hitting the poller and instead park the worker thread the old fashioned way. This has some slightly complex concurrent coordination implications which makes this whole thing just a bit more complex and introduces a tiny spin wait, but otherwise is generally okay.This should bring thread suspension performance for applications which are not using
PollingSystem
back very close to what it was in 3.5.x. TheneedsPoll
call does have some overhead, and the more complex suspension state does have a cost, but it shouldn't be that severe. I'm traveling so I haven't had a chance to run benchmarks yet but that's next on my todo list.Fixes #4328