What happened?
A SparkApplication failed and the operator moved it to PENDING_RERUN. The operator deleted the driver pod and web UI Service, but one reconcile pass logged
Resources associated with SparkApplication still exist
and then stopped. The SparkApplication remained stuck in PENDING_RERUN and was never retried.
Logs for refernece:
63:2026-04-08T00:30:32.023Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "4907d4d1-bbaa-4c2d-842c-e4304f5d28a4", "state": "RUNNING"} 64:2026-04-08T00:30:32.042Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "4907d4d1-bbaa-4c2d-842c-e4304f5d28a4"} 66:2026-04-08T00:30:32.474Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "dd9aa0b0-6632-4676-b85a-0a2a42318588", "state": "RUNNING"} 67:2026-04-08T00:30:32.521Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "dd9aa0b0-6632-4676-b85a-0a2a42318588"} 68:2026-04-08T00:30:33.956Z INFO sparkapplication/event_handler.go:87 Spark pod updated {"name": "hlu-ledger-1775608200001434796-driver", "namespace": "harness-helm-new", "oldPhase": "Running", "newPhase": "Failed"} 69:2026-04-08T00:30:33.962Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "047a0b40-97ea-4d05-a367-7acdb59d6c99", "state": "RUNNING"} 70:2026-04-08T00:30:33.983Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "047a0b40-97ea-4d05-a367-7acdb59d6c99"} 71:2026-04-08T00:30:33.991Z INFO sparkapplication/event_handler.go:195 SparkApplication updated {"name": "hlu-ledger-1775608200001434796", "namespace": "harness-helm-new", "oldState": "RUNNING", "newState": "FAILING"} 72:2026-04-08T00:30:33.997Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "state": "FAILING"} 73:2026-04-08T00:30:33.997Z INFO sparkapplication/controller.go:1140 Deleting driver pod {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "pod": "hlu-ledger-1775608200001434796-driver"} 74:2026-04-08T00:30:34.073Z INFO sparkapplication/controller.go:1162 Deleting Spark web UI service {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "service": "hlu-ledger-1775608200001434796-ui-svc"} 75:2026-04-08T00:30:34.163Z INFO sparkapplication/controller.go:226 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff"} 76:2026-04-08T00:30:34.266Z INFO sparkapplication/event_handler.go:103 Spark pod deleted {"pod": "hlu-ledger-1775608200001434796-driver", "phase": "Failed"} 77:2026-04-08T00:30:34.272Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "state": "FAILING"} 78:2026-04-08T00:30:34.272Z INFO sparkapplication/controller.go:1140 Deleting driver pod {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "pod": "hlu-ledger-1775608200001434796-driver"} 79:2026-04-08T00:30:34.292Z INFO sparkapplication/controller.go:1162 Deleting Spark web UI service {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "service": "hlu-ledger-1775608200001434796-ui-svc"} 80:2026-04-08T00:30:34.320Z INFO sparkapplication/event_handler.go:195 SparkApplication updated {"name": "hlu-ledger-1775608200001434796", "namespace": "harness-helm-new", "oldState": "FAILING", "newState": "PENDING_RERUN"} 81:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:226 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf"} 82:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71", "state": "PENDING_RERUN"} 83:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:495 Pending rerun SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71", "state": "PENDING_RERUN"} 84:2026-04-08T00:30:34.336Z INFO sparkapplication/controller.go:502 Resources associated with SparkApplication still exist {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71"} 85:2026-04-08T00:30:34.386Z INFO sparkapplication/controller.go:220 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71"}
Reproduction Code
No response
Expected behavior
If the application is in PENDING_RERUN, the controller should keep reconciling until cleanup is fully complete and the SparkApplication is resubmitted, or it should transition to a terminal state if retry is no longer possible.
Actual behavior
The controller exits the reconcile loop after seeing that some Spark resources still exist, without scheduling another retry reconcile. The SparkApplication remains stuck in PENDING_RERUN
Environment & Versions
k8s version: GKE cluster
Spark Operator Version: 2.4.0
Apache Spark Version: 3.5.4
Additional context
Status observed:
PENDING_RERUN
submissionAttempts=1
executionAttempts=1
driver pod already deleted
UI Service already deleted
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
What happened?
A SparkApplication failed and the operator moved it to PENDING_RERUN. The operator deleted the driver pod and web UI Service, but one reconcile pass logged
and then stopped. The SparkApplication remained stuck in PENDING_RERUN and was never retried.
Logs for refernece:
63:2026-04-08T00:30:32.023Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "4907d4d1-bbaa-4c2d-842c-e4304f5d28a4", "state": "RUNNING"} 64:2026-04-08T00:30:32.042Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "4907d4d1-bbaa-4c2d-842c-e4304f5d28a4"} 66:2026-04-08T00:30:32.474Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "dd9aa0b0-6632-4676-b85a-0a2a42318588", "state": "RUNNING"} 67:2026-04-08T00:30:32.521Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "dd9aa0b0-6632-4676-b85a-0a2a42318588"} 68:2026-04-08T00:30:33.956Z INFO sparkapplication/event_handler.go:87 Spark pod updated {"name": "hlu-ledger-1775608200001434796-driver", "namespace": "harness-helm-new", "oldPhase": "Running", "newPhase": "Failed"} 69:2026-04-08T00:30:33.962Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "047a0b40-97ea-4d05-a367-7acdb59d6c99", "state": "RUNNING"} 70:2026-04-08T00:30:33.983Z INFO sparkapplication/controller.go:218 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "047a0b40-97ea-4d05-a367-7acdb59d6c99"} 71:2026-04-08T00:30:33.991Z INFO sparkapplication/event_handler.go:195 SparkApplication updated {"name": "hlu-ledger-1775608200001434796", "namespace": "harness-helm-new", "oldState": "RUNNING", "newState": "FAILING"} 72:2026-04-08T00:30:33.997Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "state": "FAILING"} 73:2026-04-08T00:30:33.997Z INFO sparkapplication/controller.go:1140 Deleting driver pod {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "pod": "hlu-ledger-1775608200001434796-driver"} 74:2026-04-08T00:30:34.073Z INFO sparkapplication/controller.go:1162 Deleting Spark web UI service {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff", "service": "hlu-ledger-1775608200001434796-ui-svc"} 75:2026-04-08T00:30:34.163Z INFO sparkapplication/controller.go:226 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "29960254-1df1-44cf-ae74-c5d14a6776ff"} 76:2026-04-08T00:30:34.266Z INFO sparkapplication/event_handler.go:103 Spark pod deleted {"pod": "hlu-ledger-1775608200001434796-driver", "phase": "Failed"} 77:2026-04-08T00:30:34.272Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "state": "FAILING"} 78:2026-04-08T00:30:34.272Z INFO sparkapplication/controller.go:1140 Deleting driver pod {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "pod": "hlu-ledger-1775608200001434796-driver"} 79:2026-04-08T00:30:34.292Z INFO sparkapplication/controller.go:1162 Deleting Spark web UI service {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf", "service": "hlu-ledger-1775608200001434796-ui-svc"} 80:2026-04-08T00:30:34.320Z INFO sparkapplication/event_handler.go:195 SparkApplication updated {"name": "hlu-ledger-1775608200001434796", "namespace": "harness-helm-new", "oldState": "FAILING", "newState": "PENDING_RERUN"} 81:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:226 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "02dfc5e8-f7ab-4392-a189-5587efa34faf"} 82:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:194 Reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71", "state": "PENDING_RERUN"} 83:2026-04-08T00:30:34.331Z INFO sparkapplication/controller.go:495 Pending rerun SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71", "state": "PENDING_RERUN"} 84:2026-04-08T00:30:34.336Z INFO sparkapplication/controller.go:502 Resources associated with SparkApplication still exist {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71"} 85:2026-04-08T00:30:34.386Z INFO sparkapplication/controller.go:220 Finished reconciling SparkApplication {"controller": "spark-application-controller", "namespace": "harness-helm-new", "name": "hlu-ledger-1775608200001434796", "reconcileID": "1f85d3da-8171-45a9-b208-145b7e3acc71"}Reproduction Code
No response
Expected behavior
If the application is in PENDING_RERUN, the controller should keep reconciling until cleanup is fully complete and the SparkApplication is resubmitted, or it should transition to a terminal state if retry is no longer possible.
Actual behavior
The controller exits the reconcile loop after seeing that some Spark resources still exist, without scheduling another retry reconcile. The SparkApplication remains stuck in PENDING_RERUN
Environment & Versions
k8s version: GKE cluster
Spark Operator Version: 2.4.0
Apache Spark Version: 3.5.4
Additional context
Status observed:
PENDING_RERUN
submissionAttempts=1
executionAttempts=1
driver pod already deleted
UI Service already deleted
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍