Skip to content

Executor pods not explicitly deleted When SparkApplication enters INVALIDATING state #2803

@Shekharrajak

Description

@Shekharrajak

What happened?

  • ✋ I have searched the open/closed issues and my issue is not listed.

When updating executor instances, SparkApplication correctly transitions to INVALIDATING state, but executor pods are not explicitly deleted by the operator. Cleanup relies on Spark's automatic cleanup when the driver terminates, creating a timing window where executor pods may remain running and potentially cause resource leaks or state inconsistencies.

Steps to Reproduce

1. Create SparkApplication with 1 executor

kubectl apply -f - <<EOF
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi-test
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: docker.io/library/spark:4.0.0
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples.jar
  arguments: ["10000"]
  sparkVersion: 4.0.0
  driver:
    cores: 1
    memory: 512m
    serviceAccount: spark-operator-spark
  executor:
    instances: 1
    cores: 1
    memory: 512m
EOF

2. Wait for RUNNING state

kubectl wait --for=jsonpath='{.status.applicationState.state}'=RUNNING sparkapplication/spark-pi-test --timeout=60s
kubectl get pods -l spark-app-name=spark-pi-test

3. Update executor instances

kubectl patch sparkapplication spark-pi-test --type='merge' -p='{"spec":{"executor":{"instances":3}}}'

4. Observe the issue

# Check application state (should be INVALIDATING)
kubectl get sparkapplication spark-pi-test -o jsonpath='{.status.applicationState.state}'

# Check if old executor pods are still running
kubectl get pods -l spark-role=executor | grep spark-pi

# Check operator logs - executor pods are NOT explicitly deleted
kubectl logs -n spark-operator -l app=spark-operator-controller | grep -i "deleting executor pod"

Expected Behavior

Executor pods should be explicitly deleted by the operator when application enters INVALIDATING state, ensuring deterministic cleanup.

Environment & Versions

  • Kubernetes Version:
  • Spark Operator Version:
  • Apache Spark Version:

Additional context

No response

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions