Skip to content

Conversation

@jameslovespancakes
Copy link

Fixes #3055

During worker restart, a race condition occurs between WorkerProcess.restart() and WorkerManager._sync_states() that causes the restart to fail with: "Cannot spawn a worker process until it is idle."

The race condition happens when:

  1. WorkerProcess.restart() sets state to RESTARTING and terminates the process
  2. WorkerManager._sync_states() runs concurrently, sees process is not alive
  3. _sync_states() changes state from RESTARTING to COMPLETED
  4. WorkerProcess.restart() tries to spawn(), but state is now COMPLETED
  5. spawn() raises exception because state is not IDLE or RESTARTING

The fix prevents _sync_states() from changing the state of processes that are in the RESTARTING state, allowing the restart flow to complete without interference.

Changes:

  • Added check in _sync_states() to skip processes with state RESTARTING
  • This prevents the race condition by ensuring restart flow is not interrupted
  • Safe change: RESTARTING is temporary, managed entirely by restart() method
  • No performance impact: just one additional state check per process

Fixes sanic-org#3055

During worker restart, a race condition occurs between WorkerProcess.restart()
and WorkerManager._sync_states() that causes the restart to fail with:
"Cannot spawn a worker process until it is idle."

The race condition happens when:
1. WorkerProcess.restart() sets state to RESTARTING and terminates the process
2. WorkerManager._sync_states() runs concurrently, sees process is not alive
3. _sync_states() changes state from RESTARTING to COMPLETED
4. WorkerProcess.restart() tries to spawn(), but state is now COMPLETED
5. spawn() raises exception because state is not IDLE or RESTARTING

The fix prevents _sync_states() from changing the state of processes that
are in the RESTARTING state, allowing the restart flow to complete without
interference.

Changes:
- Added check in _sync_states() to skip processes with state RESTARTING
- This prevents the race condition by ensuring restart flow is not interrupted
- Safe change: RESTARTING is temporary, managed entirely by restart() method
- No performance impact: just one additional state check per process
@jameslovespancakes jameslovespancakes requested a review from a team as a code owner October 21, 2025 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restarting worker processes can cause a race condition

2 participants