Skip to content

Conversation

@sjhddh
Copy link
Contributor

@sjhddh sjhddh commented Jan 23, 2026

Summary

Replace the fixed 0.5s sleep (marked with # FIXME: This test can be flaky) with a proper polling loop that waits for the response status to change before attempting to cancel.

Changes

  • Poll with 0.1s intervals until status changes from "queued" (max 5s)
  • Use proper assertions in the post-cancel verification loop
  • Remove FIXME comment as the flakiness is addressed

Motivation

  • Fixed sleeps are unreliable: too short on slow machines, wasteful on fast ones
  • Polling adapts to actual server response times
  • Reduces test flakiness in CI

Test Plan

  • Run pytest tests/v1/entrypoints/openai/serving_responses/test_stateful.py::test_background_cancel multiple times
  • Verify test passes consistently

Risk

  • Low risk: only changes test synchronization logic
  • No changes to production code

@mergify mergify bot added the v1 label Jan 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the reliability of test_background_cancel by replacing a fixed sleep with a polling mechanism. This is a great change to address test flakiness. I've suggested a further refinement to the polling loop to use a monotonic clock for more accurate timeout handling, which will make the test even more robust.

Comment on lines 74 to 83
max_wait_seconds = 5.0
poll_interval = 0.1
elapsed = 0.0
while elapsed < max_wait_seconds:
await asyncio.sleep(poll_interval)
elapsed += poll_interval
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current polling loop uses elapsed += poll_interval to track time, but this doesn't account for the time spent in the client.responses.retrieve(response.id) call. This can cause the effective timeout to be significantly longer than max_wait_seconds if the API call is slow, potentially slowing down tests in CI.

Using a monotonic clock like asyncio.get_running_loop().time() provides a more accurate and robust timeout mechanism. I've also reordered the loop to check the condition before sleeping, which is slightly more efficient.

Suggested change
max_wait_seconds = 5.0
poll_interval = 0.1
elapsed = 0.0
while elapsed < max_wait_seconds:
await asyncio.sleep(poll_interval)
elapsed += poll_interval
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
loop = asyncio.get_running_loop()
start_time = loop.time()
max_wait_seconds = 5.0
poll_interval = 0.1
while loop.time() - start_time < max_wait_seconds:
response = await client.responses.retrieve(response.id)
if response.status != "queued":
# Started processing or completed - try to cancel
break
await asyncio.sleep(poll_interval)

Replace the fixed 0.5s sleep (which was marked as flaky) with a proper
polling loop that waits for the response status to change before
attempting to cancel.

This makes the test more deterministic:
- Poll with 0.1s intervals until status changes from "queued"
- Use proper assertions in the post-cancel verification loop
- Remove FIXME comment as the flakiness is addressed

Signed-off-by: 7. Sun <jhao.sun@gmail.com>
@sjhddh sjhddh force-pushed the tests/replace-sleep-with-polling branch from fcf361f to 20e45e4 Compare January 23, 2026 23:24
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant