Skip to content

Commit 43e98a1

Browse files
authored
Add work completion doc (#494)
* Add work completion doc Signed-off-by: Jian Qiu <[email protected]> * resolve comments Signed-off-by: Jian Qiu <[email protected]> --------- Signed-off-by: Jian Qiu <[email protected]>
1 parent ad4c9d7 commit 43e98a1

File tree

1 file changed

+290
-0
lines changed

1 file changed

+290
-0
lines changed

content/en/docs/concepts/work-distribution/manifestwork.md

Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,296 @@ status:
168168
name: isAvailable
169169
```
170170

171+
## Workload Completion
172+
173+
The workload completion feature allows `ManifestWork` to track when certain workloads have
174+
completed their execution and optionally perform automatic garbage collection. This is particularly
175+
useful for workloads that are expected to run once and then be cleaned up, such as Jobs or Pods with
176+
specific restart policies.
177+
178+
### Overview
179+
180+
OCM traditionally recreates any resources that get deleted from managed clusters as long
181+
as the `ManifestWork` exists. However, for workloads like Jobs with `ttlSecondsAfterFinished` or
182+
Pods that exit and get cleaned up by cluster-autoscaler, this behavior is often undesirable.
183+
The workload completion feature addresses this by:
184+
185+
- Tracking completion status of workloads using condition rules
186+
- Preventing updates to completed workloads
187+
- Optionally garbage collecting the entire `ManifestWork` after completion
188+
- Supporting both well-known Kubernetes resources and custom completion logic
189+
190+
### Condition Rules
191+
192+
Condition rules are configured in the `manifestConfigs` section to define how completion should
193+
be determined for specific manifests. You can specify condition rules using the `conditionRules` field:
194+
195+
```yaml
196+
apiVersion: work.open-cluster-management.io/v1
197+
kind: ManifestWork
198+
metadata:
199+
namespace: cluster1
200+
name: example-job
201+
spec:
202+
workload:
203+
manifests:
204+
- apiVersion: batch/v1
205+
kind: Job
206+
metadata:
207+
name: pi-calculation
208+
namespace: default
209+
spec:
210+
template:
211+
spec:
212+
containers:
213+
- name: pi
214+
image: perl:5.34.0
215+
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
216+
restartPolicy: Never
217+
backoffLimit: 4
218+
manifestConfigs:
219+
- resourceIdentifier:
220+
group: batch
221+
resource: jobs
222+
namespace: default
223+
name: pi-calculation
224+
conditionRules:
225+
- type: WellKnownConditions
226+
condition: Complete
227+
```
228+
229+
### Well-Known Completions
230+
231+
For common Kubernetes resources, you can use the `WellKnownConditions` type which provides
232+
built-in completion logic:
233+
234+
**Job Completion**: A Job is considered complete when it has a condition of type `Complete` or `Failed`
235+
with status `True`.
236+
237+
**Pod Completion**: A Pod is considered complete when its phase is `Succeeded` or `Failed`.
238+
239+
```yaml
240+
manifestConfigs:
241+
- resourceIdentifier:
242+
group: batch
243+
resource: jobs
244+
namespace: default
245+
name: my-job
246+
conditionRules:
247+
- type: WellKnownConditions
248+
condition: Complete
249+
```
250+
251+
### Custom CEL Expressions
252+
253+
For custom resources or more complex completion logic, you can use CEL (Common Expression Language) expressions:
254+
255+
```yaml
256+
manifestConfigs:
257+
- resourceIdentifier:
258+
group: example.com
259+
resource: mycustomresources
260+
namespace: default
261+
name: my-custom-resource
262+
conditionRules:
263+
- condition: Complete
264+
type: CEL
265+
celExpressions:
266+
- expression: |
267+
object.status.conditions.filter(
268+
c, c.type == 'Complete' || c.type == 'Failed'
269+
).exists(
270+
c, c.status == 'True'
271+
)
272+
messageExpression: |
273+
result ? "Custom resource is complete" : "Custom resource is not complete"
274+
```
275+
276+
In CEL expressions:
277+
- `object`: The current instance of the manifest
278+
- `result`: Boolean result of the CEL expressions (available in messageExpression)
279+
280+
### TTL and Automatic Garbage Collection
281+
282+
You can enable automatic garbage collection of the entire `ManifestWork` after all workloads
283+
with completion rules have finished by setting `ttlSecondsAfterFinished` in the `deleteOption`:
284+
285+
```yaml
286+
apiVersion: work.open-cluster-management.io/v1
287+
kind: ManifestWork
288+
metadata:
289+
namespace: cluster1
290+
name: job-with-cleanup
291+
spec:
292+
deleteOption:
293+
ttlSecondsAfterFinished: 300 # Delete 5 minutes after completion
294+
workload:
295+
manifests:
296+
- apiVersion: batch/v1
297+
kind: Job
298+
# ... job specification
299+
manifestConfigs:
300+
- resourceIdentifier:
301+
group: batch
302+
resource: jobs
303+
namespace: default
304+
name: my-job
305+
conditionRules:
306+
- type: WellKnownConditions
307+
condition: Complete
308+
```
309+
310+
**Important Notes:**
311+
- If `ttlSecondsAfterFinished` is set but no completion rules are defined, the `ManifestWork` will never be considered finished
312+
- If completion rules are set but no TTL is specified, the `ManifestWork` will complete but not be automatically deleted
313+
- Setting `ttlSecondsAfterFinished: 0` makes the `ManifestWork` eligible for immediate deletion after completion
314+
315+
### Completion Behavior
316+
317+
Once a manifest is marked as completed:
318+
319+
1. **No Further Updates**: The work agent will no longer update or recreate the completed manifest, even if the `ManifestWork` specification changes
320+
2. **ManifestWork Completion**: When all manifests with completion rules have completed, the entire `ManifestWork` is considered complete
321+
3. **Mixed Completion**: If you want some manifests to complete but not the entire `ManifestWork`, set a completion rule with CEL expression `false` for at least one other manifest
322+
323+
### Status Tracking
324+
325+
Completion status is reflected in both manifest-level and `ManifestWork`-level conditions:
326+
327+
```yaml
328+
status:
329+
conditions:
330+
- lastTransitionTime: "2025-02-20T18:53:40Z"
331+
message: "Job is finished"
332+
reason: "ConditionRulesAggregated"
333+
status: "True"
334+
type: Complete
335+
resourceStatus:
336+
manifests:
337+
- conditions:
338+
- lastTransitionTime: "2025-02-20T19:12:22Z"
339+
message: "Job is finished"
340+
reason: "ConditionRuleEvaluated"
341+
status: "True"
342+
type: Complete
343+
resourceMeta:
344+
group: batch
345+
kind: Job
346+
name: pi-calculation
347+
namespace: default
348+
ordinal: 0
349+
resource: jobs
350+
version: v1
351+
```
352+
353+
All conditions with the same type from manifest-level are aggregated to `ManifestWork`-level status.conditions.
354+
355+
### Multiple Condition Types
356+
357+
You can define multiple condition rules for different condition types on the same manifest:
358+
359+
```yaml
360+
manifestConfigs:
361+
- resourceIdentifier:
362+
group: example.com
363+
resource: mycustomresources
364+
namespace: default
365+
name: my-resource
366+
conditionRules:
367+
- condition: Complete
368+
type: CEL
369+
celExpressions:
370+
- expression: |
371+
object.status.conditions.exists(
372+
c, c.type == 'Complete' && c.status == 'True'
373+
)
374+
messageExpression: |
375+
result ? "Resource completed successfully" : "Resource not complete"
376+
- condition: Initialized
377+
type: CEL
378+
celExpressions:
379+
- expression: |
380+
object.status.conditions.exists(
381+
c, c.type == 'Initialized' && c.status == 'True'
382+
)
383+
messageExpression: |
384+
result ? "Resource is initialized" : "Resource not initialized"
385+
```
386+
387+
### Examples
388+
389+
**Run a Job once without cleanup:**
390+
391+
```yaml
392+
apiVersion: work.open-cluster-management.io/v1
393+
kind: ManifestWork
394+
metadata:
395+
namespace: cluster1
396+
name: one-time-job
397+
spec:
398+
workload:
399+
manifests:
400+
- apiVersion: batch/v1
401+
kind: Job
402+
metadata:
403+
name: data-migration
404+
namespace: default
405+
spec:
406+
template:
407+
spec:
408+
containers:
409+
- name: migrator
410+
image: my-migration-tool:latest
411+
command: ["./migrate-data.sh"]
412+
restartPolicy: Never
413+
manifestConfigs:
414+
- resourceIdentifier:
415+
group: batch
416+
resource: jobs
417+
namespace: default
418+
name: data-migration
419+
conditionRules:
420+
- type: WellKnownConditions
421+
condition: Complete
422+
```
423+
424+
**Run a Job and clean up after 30 seconds:**
425+
426+
```yaml
427+
apiVersion: work.open-cluster-management.io/v1
428+
kind: ManifestWork
429+
metadata:
430+
namespace: cluster1
431+
name: temp-job-with-cleanup
432+
spec:
433+
deleteOption:
434+
ttlSecondsAfterFinished: 30
435+
workload:
436+
manifests:
437+
- apiVersion: batch/v1
438+
kind: Job
439+
metadata:
440+
name: temp-task
441+
namespace: default
442+
spec:
443+
template:
444+
spec:
445+
containers:
446+
- name: worker
447+
image: busybox:latest
448+
command: ["echo", "Task completed"]
449+
restartPolicy: Never
450+
manifestConfigs:
451+
- resourceIdentifier:
452+
group: batch
453+
resource: jobs
454+
namespace: default
455+
name: temp-task
456+
conditionRules:
457+
- type: WellKnownConditions
458+
condition: Complete
459+
```
460+
171461
## Garbage collection
172462

173463
To ensure the resources applied by `ManifestWork` are reliably recorded, the work agent creates an `AppliedManifestWork` on the managed cluster for each `ManifestWork` as an anchor for resources relating to `ManifestWork`. When `ManifestWork` is deleted, work agent runs a `Foreground deletion`, that `ManifestWork` will stay in deleting state until all its related resources has been fully cleaned in the managed cluster.

0 commit comments

Comments
 (0)