-
-
Notifications
You must be signed in to change notification settings - Fork 12.9k
[Tests] Replace flaky sleep with polling in test_background_cancel #32986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the reliability of test_background_cancel by replacing a fixed sleep with a polling mechanism. This is a great change to address test flakiness. I've suggested a further refinement to the polling loop to use a monotonic clock for more accurate timeout handling, which will make the test even more robust.
| max_wait_seconds = 5.0 | ||
| poll_interval = 0.1 | ||
| elapsed = 0.0 | ||
| while elapsed < max_wait_seconds: | ||
| await asyncio.sleep(poll_interval) | ||
| elapsed += poll_interval | ||
| response = await client.responses.retrieve(response.id) | ||
| if response.status != "queued": | ||
| # Started processing or completed - try to cancel | ||
| break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current polling loop uses elapsed += poll_interval to track time, but this doesn't account for the time spent in the client.responses.retrieve(response.id) call. This can cause the effective timeout to be significantly longer than max_wait_seconds if the API call is slow, potentially slowing down tests in CI.
Using a monotonic clock like asyncio.get_running_loop().time() provides a more accurate and robust timeout mechanism. I've also reordered the loop to check the condition before sleeping, which is slightly more efficient.
| max_wait_seconds = 5.0 | |
| poll_interval = 0.1 | |
| elapsed = 0.0 | |
| while elapsed < max_wait_seconds: | |
| await asyncio.sleep(poll_interval) | |
| elapsed += poll_interval | |
| response = await client.responses.retrieve(response.id) | |
| if response.status != "queued": | |
| # Started processing or completed - try to cancel | |
| break | |
| loop = asyncio.get_running_loop() | |
| start_time = loop.time() | |
| max_wait_seconds = 5.0 | |
| poll_interval = 0.1 | |
| while loop.time() - start_time < max_wait_seconds: | |
| response = await client.responses.retrieve(response.id) | |
| if response.status != "queued": | |
| # Started processing or completed - try to cancel | |
| break | |
| await asyncio.sleep(poll_interval) |
Replace the fixed 0.5s sleep (which was marked as flaky) with a proper polling loop that waits for the response status to change before attempting to cancel. This makes the test more deterministic: - Poll with 0.1s intervals until status changes from "queued" - Use proper assertions in the post-cancel verification loop - Remove FIXME comment as the flakiness is addressed Signed-off-by: 7. Sun <jhao.sun@gmail.com>
fcf361f to
20e45e4
Compare
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Summary
Replace the fixed 0.5s sleep (marked with
# FIXME: This test can be flaky) with a proper polling loop that waits for the response status to change before attempting to cancel.Changes
Motivation
Test Plan
pytest tests/v1/entrypoints/openai/serving_responses/test_stateful.py::test_background_cancelmultiple timesRisk