Skip to content

feat: add managedBy field for MultiKueue integration (#2879)#2896

Open
1Ayush-Petwal wants to merge 1 commit intokubeflow:masterfrom
1Ayush-Petwal:feat/managedby-field
Open

feat: add managedBy field for MultiKueue integration (#2879)#2896
1Ayush-Petwal wants to merge 1 commit intokubeflow:masterfrom
1Ayush-Petwal:feat/managedby-field

Conversation

@1Ayush-Petwal
Copy link
Copy Markdown

Purpose of this PR

Implement spec.managedBy field to enable external controller integration (e.g., MultiKueue) for managing SparkApplication lifecycle. This allows multi-cluster schedulers to take control of SparkApplication resources while the operator respects and skips reconciliation.

Closes #2879

Proposed changes:

  • Add ManagedBy *string field to SparkApplicationSpec allowing external controllers to claim resource management
  • Add early-return guard in Reconcile() to skip reconciliation when managedBy is set to a non-operator value
  • Add immutability validation in webhook ValidateUpdate() to prevent changes to managedBy once set
  • Define SparkOperatorManagerName = "sparkoperator.k8s.io/spark-operator" constant for operator identification
  • Auto-generate CRD schema and deepcopy methods with new field

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

The managedBy pattern is a standard Kubernetes convention (adopted by batch/v1 Job, JobSet, MPI Operator, Training Operator, KubeRay) that enables clean separation of concerns. External controllers like MultiKueue can now manage SparkApplication resources across clusters without conflicts with the built-in operator.

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

  • When spec.managedBy is unset or equals "sparkoperator.k8s.io/spark-operator", the operator manages the resource (default behavior, backward compatible)
  • When set to any other value, the operator logs and skips all reconciliation
  • The field is immutable once set, enforced by admission webhook
  • No breaking changes; fully backward compatible

Signed-off-by: Ayush Petwal <ayushpetwal.0105@gmail.com>
Copilot AI review requested due to automatic review settings March 29, 2026 20:00
@github-actions
Copy link
Copy Markdown

🎉 Welcome to the Kubeflow Spark Operator! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

@google-oss-prow google-oss-prow bot requested review from ImpSy and nabuskey March 29, 2026 20:00
@google-oss-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yuchaoran2011 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a spec.managedBy field to SparkApplication to allow external controllers (e.g., MultiKueue) to take over lifecycle management while the Spark Operator conditionally skips reconciliation, with webhook-enforced immutability and regenerated API/CRD artifacts.

Changes:

  • Introduce SparkApplicationSpec.ManagedBy plus common.SparkOperatorManagerName constant.
  • Skip SparkApplication reconciliation when managedBy is set to a non-operator value.
  • Enforce managedBy immutability in the validating webhook and update generated CRD/deepcopy outputs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/common/spark.go Adds the operator’s canonical manager identifier constant used for managedBy comparisons.
internal/controller/sparkapplication/controller.go Adds a reconciliation guard to skip handling externally-managed SparkApplications.
internal/webhook/sparkapplication_validator.go Adds webhook validation to enforce spec.managedBy immutability.
internal/webhook/sparkapplication_validator_test.go Adds unit tests covering managedBy immutability behavior.
api/v1beta2/sparkapplication_types.go Adds ManagedBy to the SparkApplication API type.
api/v1beta2/zz_generated.deepcopy.go Regenerates deepcopy logic for the new ManagedBy field.
config/crd/bases/sparkoperator.k8s.io_sparkapplications.yaml Updates SparkApplication CRD schema/documentation for managedBy.
config/crd/bases/sparkoperator.k8s.io_scheduledsparkapplications.yaml Updates ScheduledSparkApplication CRD schema/documentation where SparkApplicationSpec is embedded.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 197 to +205
// Check if the spark application is being deleted
if !app.DeletionTimestamp.IsZero() {
return r.handleSparkApplicationDeletion(ctx, req)
}

// Skip reconciliation if another controller is managing this application.
if app.Spec.ManagedBy != nil && *app.Spec.ManagedBy != common.SparkOperatorManagerName {
logger.Info("Skipping reconciliation: managed by external controller", "managedBy", *app.Spec.ManagedBy)
return ctrl.Result{}, nil
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managedBy skip-guard runs after the deletion check, so SparkApplications marked for deletion will still go through handleSparkApplicationDeletion() even when spec.managedBy points to an external controller. That can lead to the operator deleting driver pods / services / ingresses that are meant to be owned/cleaned up by the external controller. Consider applying the managedBy guard to the deletion path as well (or scoping deletion only to resources the operator created).

Copilot uses AI. Check for mistakes.
Comment on lines +202 to +205
// Skip reconciliation if another controller is managing this application.
if app.Spec.ManagedBy != nil && *app.Spec.ManagedBy != common.SparkOperatorManagerName {
logger.Info("Skipping reconciliation: managed by external controller", "managedBy", *app.Spec.ManagedBy)
return ctrl.Result{}, nil
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces a new reconciliation branch (skip when spec.managedBy is set to a non-operator value), but there doesn’t appear to be controller-level test coverage for it. Since internal/controller/sparkapplication/controller_test.go already exercises many Reconcile branches, adding a focused test that asserts no state/status changes and no child resources are created when managedBy is external would help prevent regressions.

Copilot uses AI. Check for mistakes.
// the operator skips reconciliation, allowing external controllers
// (e.g. MultiKueue) to manage the resource.
// This field is immutable once set.
// +optional
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManagedBy is a *string and can be set to an empty string (managedBy: ""), which would be treated as “set” and currently causes the controller to skip reconciliation (since "" != SparkOperatorManagerName). If empty values aren’t meaningful, add API validation (e.g., +kubebuilder:validation:MinLength=1 / pattern) and/or webhook validation to reject empty/whitespace-only values.

Suggested change
// +optional
// +optional
// +kubebuilder:validation:MinLength=1

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +111
oldVal := *oldApp.Spec.ManagedBy
newVal := ""
if newApp.Spec.ManagedBy != nil {
newVal = *newApp.Spec.ManagedBy
}
if oldVal != newVal {
return nil, fmt.Errorf("spec.managedBy is immutable once set: cannot change from %q to %q", oldVal, newVal)
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The immutability check normalizes newApp.Spec.ManagedBy == nil to an empty string. This makes managedBy: "" (empty but set) effectively equivalent to being unset, allowing a change from empty-string to nil (and vice versa) without triggering the immutability guard. Consider comparing pointer presence/value directly (e.g., treat nil vs non-nil as a change) and/or rejecting empty-string values during validation.

Suggested change
oldVal := *oldApp.Spec.ManagedBy
newVal := ""
if newApp.Spec.ManagedBy != nil {
newVal = *newApp.Spec.ManagedBy
}
if oldVal != newVal {
return nil, fmt.Errorf("spec.managedBy is immutable once set: cannot change from %q to %q", oldVal, newVal)
// Once spec.managedBy has been set, it cannot be changed or unset.
if newApp.Spec.ManagedBy == nil {
return nil, fmt.Errorf("spec.managedBy is immutable once set: cannot change from %q to unset", *oldApp.Spec.ManagedBy)
}
if *oldApp.Spec.ManagedBy != *newApp.Spec.ManagedBy {
return nil, fmt.Errorf("spec.managedBy is immutable once set: cannot change from %q to %q", *oldApp.Spec.ManagedBy, *newApp.Spec.ManagedBy)

Copilot uses AI. Check for mistakes.
@tariq-hasan
Copy link
Copy Markdown
Contributor

Some additional review comments:

You would also need to run make generate to regenerate the OpenAPI spec and the Python models given that you are introducing a new field in the spec.

In addition, you might want to look at some of these tests for managedBy and consider if you would want to do the same thing here: https://github.com/kubeflow/trainer/blob/master/test/integration/webhooks/trainjob_test.go#L305.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support managedBy field on SparkApplication for integration with MultiKueue

3 participants