Skip to content

Commit b4b4738

Browse files
committed
Add work completion doc
Signed-off-by: Jian Qiu <[email protected]>
1 parent b64dbe5 commit b4b4738

File tree

1 file changed

+283
-0
lines changed

1 file changed

+283
-0
lines changed

content/en/docs/concepts/work-distribution/manifestwork.md

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,289 @@ status:
168168
name: isAvailable
169169
```
170170

171+
## Workload Completion
172+
173+
The workload completion feature allows `ManifestWork` to track when certain workloads have
174+
completed their execution and optionally perform automatic garbage collection. This is particularly
175+
useful for workloads that are expected to run once and then be cleaned up, such as Jobs or Pods with
176+
specific restart policies.
177+
178+
### Overview
179+
180+
OCM traditionally recreates any resources that get deleted from managed clusters as long
181+
as the `ManifestWork` exists. However, for workloads like Jobs with `ttlSecondsAfterFinished` or
182+
Pods that exit and get cleaned up by cluster-autoscaler, this behavior is often undesirable.
183+
The workload completion feature addresses this by:
184+
185+
- Tracking completion status of workloads using condition rules
186+
- Preventing updates to completed workloads
187+
- Optionally garbage collecting the entire `ManifestWork` after completion
188+
- Supporting both well-known Kubernetes resources and custom completion logic
189+
190+
### Condition Rules
191+
192+
Condition rules are configured in the `manifestConfigs` section to define how completion should
193+
be determined for specific manifests. You can specify condition rules using the `conditionRules` field:
194+
195+
```yaml
196+
apiVersion: work.open-cluster-management.io/v1
197+
kind: ManifestWork
198+
metadata:
199+
namespace: cluster1
200+
name: example-job
201+
spec:
202+
workload:
203+
manifests:
204+
- apiVersion: batch/v1
205+
kind: Job
206+
metadata:
207+
name: pi-calculation
208+
namespace: default
209+
spec:
210+
template:
211+
spec:
212+
containers:
213+
- name: pi
214+
image: perl:5.34.0
215+
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
216+
restartPolicy: Never
217+
backoffLimit: 4
218+
manifestConfigs:
219+
- resourceIdentifier:
220+
group: batch
221+
resource: jobs
222+
namespace: default
223+
name: pi-calculation
224+
conditionRules:
225+
- type: WellKnownCompletions
226+
```
227+
228+
### Well-Known Completions
229+
230+
For common Kubernetes resources, you can use the `WellKnownCompletions` type which provides
231+
built-in completion logic:
232+
233+
**Job Completion**: A Job is considered complete when it has a condition of type `Complete` or `Failed`
234+
with status `True`.
235+
236+
**Pod Completion**: A Pod is considered complete when its phase is `Succeeded` or `Failed`.
237+
238+
```yaml
239+
manifestConfigs:
240+
- resourceIdentifier:
241+
group: batch
242+
resource: jobs
243+
namespace: default
244+
name: my-job
245+
conditionRules:
246+
- type: WellKnownCompletions
247+
```
248+
249+
### Custom CEL Expressions
250+
251+
For custom resources or more complex completion logic, you can use CEL (Common Expression Language) expressions:
252+
253+
```yaml
254+
manifestConfigs:
255+
- resourceIdentifier:
256+
group: example.com
257+
resource: mycustomresources
258+
namespace: default
259+
name: my-custom-resource
260+
conditionRules:
261+
- condition: Complete
262+
type: CEL
263+
celExpressions:
264+
- expression: |
265+
object.status.conditions.filter(
266+
c, c.type == 'Complete' || c.type == 'Failed'
267+
).exists(
268+
c, c.status == 'True'
269+
)
270+
messageExpression: |
271+
result ? "Custom resource is complete" : "Custom resource is not complete"
272+
```
273+
274+
In CEL expressions:
275+
- `object`: The current instance of the manifest
276+
- `result`: Boolean result of the CEL expressions (available in messageExpression)
277+
278+
### TTL and Automatic Garbage Collection
279+
280+
You can enable automatic garbage collection of the entire `ManifestWork` after all workloads
281+
with completion rules have finished by setting `ttlSecondsAfterFinished` in the `deleteOption`:
282+
283+
```yaml
284+
apiVersion: work.open-cluster-management.io/v1
285+
kind: ManifestWork
286+
metadata:
287+
namespace: cluster1
288+
name: job-with-cleanup
289+
spec:
290+
deleteOption:
291+
ttlSecondsAfterFinished: 300 # Delete 5 minutes after completion
292+
workload:
293+
manifests:
294+
- apiVersion: batch/v1
295+
kind: Job
296+
# ... job specification
297+
manifestConfigs:
298+
- resourceIdentifier:
299+
group: batch
300+
resource: jobs
301+
namespace: default
302+
name: my-job
303+
conditionRules:
304+
- type: WellKnownCompletions
305+
```
306+
307+
**Important Notes:**
308+
- If `ttlSecondsAfterFinished` is set but no completion rules are defined, the `ManifestWork` will never be considered finished
309+
- If completion rules are set but no TTL is specified, the `ManifestWork` will complete but not be automatically deleted
310+
- Setting `ttlSecondsAfterFinished: 0` makes the `ManifestWork` eligible for immediate deletion after completion
311+
312+
### Completion Behavior
313+
314+
Once a manifest is marked as completed:
315+
316+
1. **No Further Updates**: The work agent will no longer update or recreate the completed manifest, even if the `ManifestWork` specification changes
317+
2. **ManifestWork Completion**: When all manifests with completion rules have completed, the entire `ManifestWork` is considered complete
318+
3. **Mixed Completion**: If you want some manifests to complete but not the entire `ManifestWork`, set a completion rule with CEL expression `false` for at least one other manifest
319+
320+
### Status Tracking
321+
322+
Completion status is reflected in both manifest-level and `ManifestWork`-level conditions:
323+
324+
```yaml
325+
status:
326+
conditions:
327+
- lastTransitionTime: "2025-02-20T18:53:40Z"
328+
message: "All manifests with completion rules are complete"
329+
reason: "ConditionRulesPassed"
330+
status: "True"
331+
type: Complete
332+
resourceStatus:
333+
manifests:
334+
- conditions:
335+
- lastTransitionTime: "2025-02-20T19:12:22Z"
336+
message: "Job is finished"
337+
reason: "ConditionRulesPassed"
338+
status: "True"
339+
type: Complete
340+
resourceMeta:
341+
group: batch
342+
kind: Job
343+
name: pi-calculation
344+
namespace: default
345+
ordinal: 0
346+
resource: jobs
347+
version: v1
348+
```
349+
350+
### Multiple Condition Types
351+
352+
You can define multiple condition rules for different condition types on the same manifest:
353+
354+
```yaml
355+
manifestConfigs:
356+
- resourceIdentifier:
357+
group: example.com
358+
resource: mycustomresources
359+
namespace: default
360+
name: my-resource
361+
conditionRules:
362+
- condition: Complete
363+
type: CEL
364+
celExpressions:
365+
- expression: |
366+
object.status.conditions.exists(
367+
c, c.type == 'Complete' && c.status == 'True'
368+
)
369+
messageExpression: |
370+
result ? "Resource completed successfully" : "Resource not complete"
371+
- condition: Initialized
372+
type: CEL
373+
celExpressions:
374+
- expression: |
375+
object.status.conditions.exists(
376+
c, c.type == 'Initialized' && c.status == 'True'
377+
)
378+
messageExpression: |
379+
result ? "Resource is initialized" : "Resource not initialized"
380+
```
381+
382+
### Examples
383+
384+
**Run a Job once without cleanup:**
385+
386+
```yaml
387+
apiVersion: work.open-cluster-management.io/v1
388+
kind: ManifestWork
389+
metadata:
390+
namespace: cluster1
391+
name: one-time-job
392+
spec:
393+
workload:
394+
manifests:
395+
- apiVersion: batch/v1
396+
kind: Job
397+
metadata:
398+
name: data-migration
399+
namespace: default
400+
spec:
401+
template:
402+
spec:
403+
containers:
404+
- name: migrator
405+
image: my-migration-tool:latest
406+
command: ["./migrate-data.sh"]
407+
restartPolicy: Never
408+
manifestConfigs:
409+
- resourceIdentifier:
410+
group: batch
411+
resource: jobs
412+
namespace: default
413+
name: data-migration
414+
conditionRules:
415+
- type: WellKnownCompletions
416+
```
417+
418+
**Run a Job and clean up after 30 seconds:**
419+
420+
```yaml
421+
apiVersion: work.open-cluster-management.io/v1
422+
kind: ManifestWork
423+
metadata:
424+
namespace: cluster1
425+
name: temp-job-with-cleanup
426+
spec:
427+
deleteOption:
428+
ttlSecondsAfterFinished: 30
429+
workload:
430+
manifests:
431+
- apiVersion: batch/v1
432+
kind: Job
433+
metadata:
434+
name: temp-task
435+
namespace: default
436+
spec:
437+
template:
438+
spec:
439+
containers:
440+
- name: worker
441+
image: busybox:latest
442+
command: ["echo", "Task completed"]
443+
restartPolicy: Never
444+
manifestConfigs:
445+
- resourceIdentifier:
446+
group: batch
447+
resource: jobs
448+
namespace: default
449+
name: temp-task
450+
conditionRules:
451+
- type: WellKnownCompletions
452+
```
453+
171454
## Garbage collection
172455

173456
To ensure the resources applied by `ManifestWork` are reliably recorded, the work agent creates an `AppliedManifestWork` on the managed cluster for each `ManifestWork` as an anchor for resources relating to `ManifestWork`. When `ManifestWork` is deleted, work agent runs a `Foreground deletion`, that `ManifestWork` will stay in deleting state until all its related resources has been fully cleaned in the managed cluster.

0 commit comments

Comments
 (0)