feat(KEP-3258): implement delayed admission check retries #7370

dhenkel92 · 2025-10-23T18:46:24Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Implements delayed retry mechanism KEP-3258 for admission checks to prevent overwhelming external controllers and reduce control plane churn.

Problem:
Previously, when admission checks transitioned to Retry state, Kueue would immediately evict workloads and requeue them, causing excessive load on admission check controllers and unnecessary API server churn, particularly when retry conditions persisted predictably.

Solution:
This implementation adds two new fields to AdmissionCheckState:

requeueAfterSeconds: Specifies minimum wait time before retry
retryCount: Tracks retry attempts per admission check

Key Changes:

API Changes

Added requeueAfterSeconds and retryCount fields to AdmissionCheckState
Changed lastTransitionTime from required to optional (now pointer type) to enable webhook-managed timestamping and reduce clock skew issues

Controller Changes

Workload controller now respects delayed retry times before resetting checks
Prevents premature re-admission while delays are active
Automatically sets lastTransitionTime when admission check state changes
Auto-increments retryCount on transition to Retry state
Calculates maximum retry time across all admission checks
Updates workload.status.requeueState.requeueAt with the maximum delay
Verbs extended to include status updates for timestamp management

Behavior:
When multiple admission checks specify different delays, Kueue uses the maximum delay across all checks. Workloads are evicted immediately to release quota, but admission check states aren't reset until all delays expire, preventing race conditions where fast-responding checks could block slower ones from registering their delays.

Which issue(s) this PR fixes:

Fixes #3258

Special notes for your reviewer:

Does this PR introduce a user-facing change?

TBD

Support delayed admission check retries

k8s-ci-robot · 2025-10-23T18:46:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dhenkel92
Once this PR has been reviewed and has the lgtm label, please assign mimowo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-10-23T18:46:35Z

Hi @dhenkel92. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-10-23T18:47:28Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`0e67543`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/690321f6c4669c000896747e

apis/kueue/v1beta1/workload_types.go

pkg/workload/admissionchecks.go

kannon92 · 2025-10-23T20:00:48Z

/ok-to-test

dhenkel92 · 2025-10-24T06:21:44Z

/retest

mimowo · 2025-10-24T06:50:55Z

@dhenkel92 thank you for the PR
@PBundyra would you like to give it the first pass?

mimowo · 2025-10-24T06:51:28Z

@dhenkel92 please update the release notes

apis/kueue/v1beta1/workload_types.go

config/components/webhook/manifests.yaml

pkg/controller/admissionchecks/provisioning/controller.go

pkg/controller/core/workload_controller.go

pkg/workload/admissionchecks_test.go

mimowo · 2025-10-28T17:50:04Z

pkg/workload/workload.go

 	StatusFinished      = "finished"
 )

-var (


nit, I would suggest to remove the distractions, but feel free to keep if done by ide automatically

pkg/workload/workload.go

test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go

mimowo · 2025-10-28T18:09:18Z

test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go

+			ginkgo.By("Wait for job to have quota again", func() {
+				gomega.Eventually(func(g gomega.Gomega) {
+					g.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed())
+					g.Expect(workload.HasQuotaReservation(createdWorkload)).To(gomega.BeTrue())
+					g.Expect(workload.IsAdmitted(createdWorkload)).To(gomega.BeFalse())
+				}, util.LongTimeout, util.Interval).Should(gomega.Succeed())
+			})


IIUC in this block we could also check that the RequeueAt field is cleared

I've added a comment why it's not possible right now and I'll create a ticket afterwards.

mimowo · 2025-10-28T18:10:54Z

test/integration/singlecluster/scheduler/podsready/scheduler_test.go

-				Count: ptr.To[int32](1),
-			}, false)
+				Count: ptr.To[int32](2),
+			}, true)


Please verify why this change is needed

It looks like it is a flaky test that depends on the timing. I've reverted the change and will see how it performs.

mimowo · 2025-10-28T18:11:13Z

test/integration/singlecluster/scheduler/podsready/scheduler_test.go

-			util.SetRequeuedConditionWithPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl))
 			util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{
 				Count: ptr.To[int32](2),
-			}, false)
+			}, true)


Also check if this change is needed and why

I've added a comment explaining why the change is required and I'll open a ticket afterwards.

pkg/workload/admissionchecks.go

dhenkel92 · 2025-10-29T19:20:35Z

test/integration/singlecluster/scheduler/podsready/scheduler_test.go

 				util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(wl2), &kueue.RequeueState{
 					Count: ptr.To[int32](1),
-				}, true)
+				}, false)


This is a valid change. We were not resetting the RequeueState.ReqeueAfter for this workload state before.

hm, maybe we don't need to clean it at all then (also related to the point above). I think an old value should not cause user-observable problems anyway, wdyt? Also related to https://github.com/kubernetes-sigs/kueue/pull/7370/files#r2475037424

EDIT: so my thinking is that if the workload s admitted, then the stale value is harmless (ignored) anyway, but when we are evicting we are going to update it anyway. So, I'm not clear we need to clear it at all. Especially if we didn't clear before this PR, then it seems orthogonal.

I agree it would be "ideal", but I want to minimize the scope of changes by the PR to bare minimum.

dhenkel92 · 2025-10-29T19:22:51Z

test/integration/singlecluster/scheduler/podsready/scheduler_test.go

-			util.SetRequeuedConditionWithPodsReadyTimeout(ctx, k8sClient, client.ObjectKeyFromObject(prodWl))
 			util.ExpectWorkloadToHaveRequeueState(ctx, k8sClient, client.ObjectKeyFromObject(prodWl), &kueue.RequeueState{
 				Count: ptr.To[int32](2),
-			}, false)
+			}, true)


I've added a comment explaining why the change is required and I'll open a ticket afterwards.

dhenkel92 · 2025-10-29T19:24:11Z

test/integration/singlecluster/scheduler/podsready/scheduler_test.go

-				Count: ptr.To[int32](1),
-			}, false)
+				Count: ptr.To[int32](2),
+			}, true)


It looks like it is a flaky test that depends on the timing. I've reverted the change and will see how it performs.

pkg/workload/admissionchecks.go

dhenkel92 · 2025-10-29T20:53:08Z

test/integration/singlecluster/scheduler/delayedadmission/delayed_admission_test.go

+			ginkgo.By("Wait for job to have quota again", func() {
+				gomega.Eventually(func(g gomega.Gomega) {
+					g.Expect(k8sClient.Get(ctx, wlLookupKey, createdWorkload)).To(gomega.Succeed())
+					g.Expect(workload.HasQuotaReservation(createdWorkload)).To(gomega.BeTrue())
+					g.Expect(workload.IsAdmitted(createdWorkload)).To(gomega.BeFalse())
+				}, util.LongTimeout, util.Interval).Should(gomega.Succeed())
+			})


I've added a comment why it's not possible right now and I'll create a ticket afterwards.

dhenkel92 · 2025-10-29T20:55:13Z

pkg/controller/core/workload_controller_test.go

+		"should reset requeueState.requeueAt when expired": {
+			cq: utiltestingapi.MakeClusterQueue("cq").Obj(),
+			lq: utiltestingapi.MakeLocalQueue("lq", "ns").ClusterQueue("cq").Obj(),
+			workload: utiltestingapi.MakeWorkload("wl", "ns").
+				Queue("lq").
+				RequeueState(nil, ptr.To(metav1.NewTime(time.Now().Add(-time.Second)))).
+				Obj(),
+			wantWorkload: utiltestingapi.MakeWorkload("wl", "ns").
+				Queue("lq").
+				Obj(),
+			wantWorkloadUseMergePatch: utiltestingapi.MakeWorkload("wl", "ns").
+				Queue("lq").
+				Obj(),
+		},


For some reason, the wantWorkload part of the test is not applying the patch and I don't understand why.

It looks like it's a limitation of using SSA patches with the fake client. The only option i see right now is to remove the test.

Implements delayed retry mechanism for admission checks to prevent overwhelming external controllers and reduce control plane churn. Problem: Previously, when admission checks transitioned to Retry state, Kueue would immediately evict workloads and requeue them, causing excessive load on admission check controllers and unnecessary API server churn, particularly when retry conditions persisted predictably. Solution: This implementation adds two new fields to AdmissionCheckState: - requeueAfterSeconds: Specifies minimum wait time before retry - retryCount: Tracks retry attempts per admission check Key Changes: API Changes - Added requeueAfterSeconds and retryCount fields to AdmissionCheckState Controller Changes - Auto-increments retryCount on transition to Retry state - Calculates maximum retry time across all admission checks - Updates workload.status.requeueState.requeueAt with the maximum delay - Workload controller now respects delayed retry times before resetting checks - Prevents premature re-admission while delays are active Behavior: When multiple admission checks specify different delays, Kueue uses the maximum delay across all checks. Workloads are evicted immediately to release quota, but admission check states aren't reset until all delays expire, preventing race conditions where fast-responding checks could block slower ones from registering their delays. Refs: KEP-3258 https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md

This is a follow-up to [kubernetes-sigs#7370](kubernetes-sigs#7370), where we introduced delayed admission checks. This PR replaces the use of the shared RequeueState field with the new mechanism in the preprovision request admission check. Refs: KEP-3258 https://github.com/DataDog/kueue/blob/main/keps/3258-delayed-admission-check-retries/README.md

dhenkel92 · 2025-10-30T08:55:53Z

/retest

mimowo · 2025-10-30T14:57:23Z

Let's check if there are not flakes 1/5
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main

dhenkel92 · 2025-10-31T09:01:01Z

Let's check if there are not flakes 2/5
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main

dhenkel92 · 2025-10-31T10:44:48Z

Let's check if there are not flakes 3/5
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main

dhenkel92 · 2025-10-31T11:16:54Z

Let's check if there are not flakes 4/5
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main

dhenkel92 · 2025-10-31T11:43:25Z

Let's check if there are not flakes 5/5
/test pull-kueue-test-e2e-certmanager-main
/test pull-kueue-test-e2e-customconfigs-main
/test pull-kueue-test-e2e-main-1-31
/test pull-kueue-test-e2e-main-1-32
/test pull-kueue-test-e2e-main-1-33
/test pull-kueue-test-e2e-main-1-34
/test pull-kueue-test-integration-baseline-main
/test pull-kueue-test-integration-extended-main
/test pull-kueue-test-integration-multikueue-main

k8s-ci-robot · 2025-10-31T11:56:51Z

@dhenkel92: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kueue-test-integration-extended-main	`0e67543`	link	true	`/test pull-kueue-test-integration-extended-main`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot requested review from kannon92 and mimowo October 23, 2025 18:46

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 23, 2025

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 23, 2025

dhenkel92 commented Oct 23, 2025

View reviewed changes

apis/kueue/v1beta1/workload_types.go Show resolved Hide resolved

pkg/workload/admissionchecks.go Outdated Show resolved Hide resolved

dhenkel92 force-pushed the 09-09-feat_admissionchecks_implement_delayed_admission branch from 1f1dd14 to 8939064 Compare October 23, 2025 19:36

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 23, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 23, 2025

dhenkel92 force-pushed the 09-09-feat_admissionchecks_implement_delayed_admission branch 2 times, most recently from 1a0db58 to ed2f2dc Compare October 24, 2025 05:59

mimowo reviewed Oct 24, 2025

View reviewed changes

apis/kueue/v1beta1/workload_types.go Outdated Show resolved Hide resolved

mimowo reviewed Oct 24, 2025

View reviewed changes

config/components/webhook/manifests.yaml Outdated Show resolved Hide resolved

dhenkel92 force-pushed the 09-09-feat_admissionchecks_implement_delayed_admission branch 2 times, most recently from 308d4d4 to 319ddc9 Compare October 27, 2025 10:06

mimowo reviewed Oct 27, 2025

View reviewed changes

pkg/controller/admissionchecks/provisioning/controller.go Outdated Show resolved Hide resolved

dhenkel92 force-pushed the 09-09-feat_admissionchecks_implement_delayed_admission branch 2 times, most recently from 1a26439 to a98c452 Compare October 27, 2025 19:42