feat(backend): add retry attempt metadata to MLMD executions#12904
feat(backend): add retry attempt metadata to MLMD executions#12904jeffspahr wants to merge 8 commits intokubeflow:masterfrom
Conversation
Remove three unused dependencies that collectively pull in the deprecated `request` package and its large transitive tree. ## `request` (direct dependency) Used only by `frontend/scripts/replace_protos.js`, a script that downloads MLMD proto files from GitHub via `npm run build:replace`. Evidence the script is dead: - Only one file in the repo imports `request`: this script (line 19) - The script was last modified in commit e035a88 (Aug 2021) to bump MLMD to v1.2.0 - The most recent MLMD upgrade (commit eaec515, Aug 2023, bumping to v1.14.0) updated the proto files directly in git without running this script - `build:replace` is not referenced in any CI workflow, Makefile, or other script - The proto files the script manages (third_party/ml-metadata/...) are checked directly into git and updated manually ## `coveralls` (devDependency) Submits test coverage data to the Coveralls service. Evidence it is dead: - The two scripts that used it were deleted in commit c7f8729 (Sep 2024, "chore: Remove unwanted Frontend test files kubeflow#10973"): `frontend/scripts/report-coveralls.sh` and `frontend/scripts/get-coveralls-repo-token.js` - The npm script that invoked them (`test:ci:prow`) was also removed in that commit - No current npm script, CI workflow, or source file references coveralls - No .coveralls.yml config file exists ## `swagger-ts-client` (devDependency) Generates TypeScript API clients from Swagger specs. Evidence it is dead: - Present since the initial commit 633e2dd but was never actually used — the repo has always used Java-based swagger-codegen-cli.jar for API client generation (see apis:* scripts in package.json) - Zero imports, references, npm scripts, or config files for swagger-ts-client exist anywhere in the repo ## Changes - Delete `frontend/scripts/replace_protos.js` - Remove `request`, `coveralls`, `swagger-ts-client` from package.json - Remove `build:replace` npm script - Regenerate package-lock.json (64 transitive packages pruned) - Remove `build:replace` documentation from CONTRIBUTING.md ## Verification - npm ci: passes - npm run lint: passes - npm run typecheck: passes - npm run test: 117/117 files, 1652/1652 tests pass - npm ls request: confirms request is no longer in the tree Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Address Copilot review feedback: - Remove `npm run build:replace` reference from third_party/ml-metadata/README.md, point to build:protos instead - Remove dead Coveralls badge from root README.md (service no longer used since kubeflow#10973) Signed-off-by: Jeff Spahr <spahrj@gmail.com>
The previous lockfile was generated with Node 25 / npm 11, which produces a different resolution format. CI uses Node 22.19.0 / npm 10.9.3 (per .nvmrc), causing npm ci to fail with missing entries (e.g. babel-plugin-macros@3.1.0). Regenerated with the correct toolchain version. Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Resolve frontend/package-lock.json conflict by regenerating the lockfile with Node 22.19.0 / npm 10.9.3 after applying upstream master changes. Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Use the current master lockfile as baseline and regenerate lock metadata under Node 22.19.0/npm 10 so only this PR's dependency removals remain. This avoids unintended transitive upgrades that caused frontend TypeScript failures in CI. Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Populate max_retry_count and retry_count as int custom properties on MLMD executions so the frontend can display retry visibility (PR kubeflow#12596). - metadata/client.go: add MaxRetryCount/RetryCount fields to ExecutionConfig; write them in CreateExecution and PrePublishExecution - driver/container.go: read retry policy max_retry_count at execution creation time - compiler/argocompiler/container.go: inject KFP_RETRY_COUNT={{retries}} env var into the impl template when retry is configured - component/constants.go: add EnvRetryCount constant - component/launcher_v2.go: read KFP_RETRY_COUNT env in prePublish and set retry_count on the execution config - compiler/argocompiler/container_test.go: verify env var presence, absence, and that commonEnvs is not mutated Signed-off-by: Jeff Spahr <spahrj@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Adds retry-attempt visibility to ML Metadata (MLMD) executions for KFP v2 so the UI can display retry information, by persisting both the configured max retries and the current retry counter. Also includes some frontend/proto-tooling cleanup (removed deprecated scripts/deps and coverage badge).
Changes:
- Persist
max_retry_counton MLMD execution creation (from task retry policy) andretry_countduring pre-publish (fromKFP_RETRY_COUNT). - Argo compiler injects
KFP_RETRY_COUNT={{retries}}when a task has a retry policy, and adds tests around env var presence andcommonEnvsbehavior. - Removes legacy frontend proto replacement script and prunes related frontend dependencies / Coveralls badge.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
backend/src/v2/metadata/client.go |
Adds max_retry_count / retry_count MLMD custom properties support via ExecutionConfig. |
backend/src/v2/driver/container.go |
Populates MaxRetryCount from the task retry policy when creating executions. |
backend/src/v2/component/launcher_v2.go |
Reads KFP_RETRY_COUNT and writes retry_count at pre-publish time. |
backend/src/v2/component/constants.go |
Introduces EnvRetryCount constant. |
backend/src/v2/compiler/argocompiler/container.go |
Injects KFP_RETRY_COUNT={{retries}} into container templates when retry is configured. |
backend/src/v2/compiler/argocompiler/container_test.go |
Adds compiler tests for retry env var and commonEnvs non-mutation. |
frontend/scripts/replace_protos.js |
Removes legacy proto replacement script. |
frontend/package.json |
Removes build:replace and drops unused/deprecated deps (e.g., request, coveralls, swagger-ts-client). |
frontend/package-lock.json |
Lockfile updates reflecting removed deps. |
frontend/CONTRIBUTING.md |
Removes documentation for the deleted build:replace workflow. |
third_party/ml-metadata/README.md |
Updates MLMD JS proto regeneration docs link/instructions. |
README.md |
Removes Coveralls badge. |
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
- Clarify RetryCount comment as 0-based counter (matches Argo {{retries}})
- Log warning at V(2) when KFP_RETRY_COUNT parse fails instead of
silently dropping retry metadata
- Deep-compare commonEnvs contents in mutation test instead of just
checking length
Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Add the KFP_RETRY_COUNT={{retries}} env var to the expected compiled
workflow for pipeline_with_retry, matching the new compiler behavior.
Signed-off-by: Jeff Spahr <spahrj@gmail.com>
Summary
max_retry_countandretry_countas int custom properties on MLMD executions so the frontend can display retry visibility (companion to feat(#12169): Show retry attempts for pipeline tasks in UI #12596)max_retry_countat execution creation time from the task's retry policy; the launcher writesretry_count(current attempt) at pre-publish time via theKFP_RETRY_COUNTenv var injected by the Argo compilercommonEnvsslice is not mutatedrequest,coveralls,swagger-ts-client), the unusedbuild:replacescript, and the Coveralls badgeTest plan
go build ./...passesgo test ./compiler/argocompiler/...— 3 new tests pass (env var present, absent, no mutation)go test ./driver/... ./component/... ./metadata/...— all existing tests passretry_count/max_retry_countwithgetIntValue()(notgetStringValue())