-
Notifications
You must be signed in to change notification settings - Fork 451
feat(KEP-3258): implement delayed admission check retries #7370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(KEP-3258): implement delayed admission check retries #7370
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dhenkel92 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @dhenkel92. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
1f1dd14 to
8939064
Compare
|
/ok-to-test |
1a0db58 to
ed2f2dc
Compare
|
/retest |
|
@dhenkel92 thank you for the PR |
|
@dhenkel92 please update the release notes |
308d4d4 to
319ddc9
Compare
1a26439 to
a98c452
Compare
| StatusFinished = "finished" | ||
| ) | ||
|
|
||
| var ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, I would suggest to remove the distractions, but feel free to keep if done by ide automatically
test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go
Outdated
Show resolved
Hide resolved
test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go
Outdated
Show resolved
Hide resolved
test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go
Show resolved
Hide resolved
| ginkgo.By("Wait for job to have quota again", func() { | ||
| gomega.Eventually(func(g gomega.Gomega) { | ||
| g.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed()) | ||
| g.Expect(workload.HasQuotaReservation(createdWorkload)).To(gomega.BeTrue()) | ||
| g.Expect(workload.IsAdmitted(createdWorkload)).To(gomega.BeFalse()) | ||
| }, util.LongTimeout, util.Interval).Should(gomega.Succeed()) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC in this block we could also check that the RequeueAt field is cleared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment why it's not possible right now and I'll create a ticket afterwards.
| Count: ptr.To[int32](1), | ||
| }, false) | ||
| Count: ptr.To[int32](2), | ||
| }, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify why this change is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is a flaky test that depends on the timing. I've reverted the change and will see how it performs.
| util.SetRequeuedConditionWithPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl)) | ||
| util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{ | ||
| Count: ptr.To[int32](2), | ||
| }, false) | ||
| }, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check if this change is needed and why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment explaining why the change is required and I'll open a ticket afterwards.
926299c to
93c08d7
Compare
| util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(wl2), &kueue.RequeueState{ | ||
| Count: ptr.To[int32](1), | ||
| }, true) | ||
| }, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valid change. We were not resetting the RequeueState.ReqeueAfter for this workload state before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, maybe we don't need to clean it at all then (also related to the point above). I think an old value should not cause user-observable problems anyway, wdyt? Also related to https://github.com/kubernetes-sigs/kueue/pull/7370/files#r2475037424
EDIT: so my thinking is that if the workload s admitted, then the stale value is harmless (ignored) anyway, but when we are evicting we are going to update it anyway. So, I'm not clear we need to clear it at all. Especially if we didn't clear before this PR, then it seems orthogonal.
I agree it would be "ideal", but I want to minimize the scope of changes by the PR to bare minimum.
| util.SetRequeuedConditionWithPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl)) | ||
| util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{ | ||
| Count: ptr.To[int32](2), | ||
| }, false) | ||
| }, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment explaining why the change is required and I'll open a ticket afterwards.
| Count: ptr.To[int32](1), | ||
| }, false) | ||
| Count: ptr.To[int32](2), | ||
| }, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is a flaky test that depends on the timing. I've reverted the change and will see how it performs.
| ginkgo.By("Wait for job to have quota again", func() { | ||
| gomega.Eventually(func(g gomega.Gomega) { | ||
| g.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed()) | ||
| g.Expect(workload.HasQuotaReservation(createdWorkload)).To(gomega.BeTrue()) | ||
| g.Expect(workload.IsAdmitted(createdWorkload)).To(gomega.BeFalse()) | ||
| }, util.LongTimeout, util.Interval).Should(gomega.Succeed()) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment why it's not possible right now and I'll create a ticket afterwards.
| "should reset requeueState.requeueAt when expired": { | ||
| cq: utiltestingapi.MakeClusterQueue("cq").Obj(), | ||
| lq: utiltestingapi.MakeLocalQueue("lq", "ns").ClusterQueue("cq").Obj(), | ||
| workload: utiltestingapi.MakeWorkload("wl", "ns"). | ||
| Queue("lq"). | ||
| RequeueState(nil, ptr.To(metav1.NewTime(time.Now().Add(-time.Second)))). | ||
| Obj(), | ||
| wantWorkload: utiltestingapi.MakeWorkload("wl", "ns"). | ||
| Queue("lq"). | ||
| Obj(), | ||
| wantWorkloadUseMergePatch: utiltestingapi.MakeWorkload("wl", "ns"). | ||
| Queue("lq"). | ||
| Obj(), | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, the wantWorkload part of the test is not applying the patch and I don't understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's a limitation of using SSA patches with the fake client. The only option i see right now is to remove the test.
93c08d7 to
ff342b3
Compare
Implements delayed retry mechanism for admission checks to prevent overwhelming external controllers and reduce control plane churn. Problem: Previously, when admission checks transitioned to Retry state, Kueue would immediately evict workloads and requeue them, causing excessive load on admission check controllers and unnecessary API server churn, particularly when retry conditions persisted predictably. Solution: This implementation adds two new fields to AdmissionCheckState: - requeueAfterSeconds: Specifies minimum wait time before retry - retryCount: Tracks retry attempts per admission check Key Changes: API Changes - Added requeueAfterSeconds and retryCount fields to AdmissionCheckState Controller Changes - Auto-increments retryCount on transition to Retry state - Calculates maximum retry time across all admission checks - Updates workload.status.requeueState.requeueAt with the maximum delay - Workload controller now respects delayed retry times before resetting checks - Prevents premature re-admission while delays are active Behavior: When multiple admission checks specify different delays, Kueue uses the maximum delay across all checks. Workloads are evicted immediately to release quota, but admission check states aren't reset until all delays expire, preventing race conditions where fast-responding checks could block slower ones from registering their delays. Refs: KEP-3258 https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md
ff342b3 to
0e67543
Compare
This is a follow-up to [kubernetes-sigs#7370](kubernetes-sigs#7370), where we introduced delayed admission checks. This PR replaces the use of the shared RequeueState field with the new mechanism in the preprovision request admission check. Refs: KEP-3258 https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md
This is a follow-up to [kubernetes-sigs#7370](kubernetes-sigs#7370), where we introduced delayed admission checks. This PR replaces the use of the shared RequeueState field with the new mechanism in the preprovision request admission check. Refs: KEP-3258 https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md
|
/retest |
|
Let's check if there are not flakes 1/5 |
|
Let's check if there are not flakes 2/5 |
|
Let's check if there are not flakes 3/5 |
|
Let's check if there are not flakes 4/5 |
|
Let's check if there are not flakes 5/5 |
|
@dhenkel92: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
Implements delayed retry mechanism KEP-3258 for admission checks to prevent overwhelming external controllers and reduce control plane churn.
Problem:
Previously, when admission checks transitioned to Retry state, Kueue would immediately evict workloads and requeue them, causing excessive load on admission check controllers and unnecessary API server churn, particularly when retry conditions persisted predictably.
Solution:
This implementation adds two new fields to AdmissionCheckState:
Key Changes:
API Changes
Controller Changes
Behavior:
When multiple admission checks specify different delays, Kueue uses the maximum delay across all checks. Workloads are evicted immediately to release quota, but admission check states aren't reset until all delays expire, preventing race conditions where fast-responding checks could block slower ones from registering their delays.
Which issue(s) this PR fixes:
Fixes #3258
Special notes for your reviewer:
Does this PR introduce a user-facing change?
TBD