Skip to content

Conversation

@dhenkel92
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is a follow-up to #7370, where we introduced delayed admission checks.
This PR replaces the use of the shared RequeueState field with the new mechanism in the preprovision request admission check.

Which issue(s) this PR fixes:

Refs: KEP-3258

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The pre-provisioning retry count of existing workloads will be reset in this version. All existing delays will still be respected.

Implements delayed retry mechanism for admission checks to prevent overwhelming
external controllers and reduce control plane churn.

Problem:
Previously, when admission checks transitioned to Retry state, Kueue would
immediately evict workloads and requeue them, causing excessive load on
admission check controllers and unnecessary API server churn, particularly
when retry conditions persisted predictably.

Solution:
This implementation adds two new fields to AdmissionCheckState:
- requeueAfterSeconds: Specifies minimum wait time before retry
- retryCount: Tracks retry attempts per admission check

Key Changes:

API Changes
- Added requeueAfterSeconds and retryCount fields to AdmissionCheckState

Controller Changes
- Auto-increments retryCount on transition to Retry state
- Calculates maximum retry time across all admission checks
- Updates workload.status.requeueState.requeueAt with the maximum delay
- Workload controller now respects delayed retry times before resetting checks
- Prevents premature re-admission while delays are active

Behavior:
When multiple admission checks specify different delays, Kueue uses the maximum
delay across all checks. Workloads are evicted immediately to release quota, but
admission check states aren't reset until all delays expire, preventing race
conditions where fast-responding checks could block slower ones from registering
their delays.

Refs: KEP-3258
https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md
This is a follow-up to [kubernetes-sigs#7370](kubernetes-sigs#7370), where we introduced delayed admission checks.
This PR replaces the use of the shared RequeueState field with the new mechanism in the preprovision request admission check.

Refs: KEP-3258
https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 30, 2025
@netlify
Copy link

netlify bot commented Oct 30, 2025

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit ab3c246
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/6903ade2eb0ece000737f5e6
😎 Deploy Preview https://deploy-preview-7464--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhenkel92
Once this PR has been reviewed and has the lgtm label, please assign tenzen-y for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from kannon92 October 30, 2025 18:26
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 30, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @dhenkel92. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dhenkel92 dhenkel92 marked this pull request as draft October 30, 2025 18:26
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 30, 2025
@mbobrovskyi
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants