[SPARK-50996][K8S] Increase spark.kubernetes.allocation.batch.size to 10

dongjoon-hyun · dongjoon-hyun · commit 9da1cd02951b · 2025-01-26T16:47:42.000-08:00
### What changes were proposed in this pull request? This PR aims to increase `spark.kubernetes.allocation.batch.size` to 10 from 5 in Apache Spark 4.0.0. ### Why are the changes needed? Since Apache Spark 2.3.0, Apache Spark uses `5` as the default value of executor allocation batch size for 8 years conservatively. - #19468 Given that the improvement of K8s hardware infrastructure for last 8 year, we had better use a bigger value, `10`, from Apache Spark 4.0.0 in 2025. Technically, when we request 1200 executor pod, - Batch Size `5` takes 4 minutes. - Batch Size `10` takes 2 minutes. ### Does this PR introduce _any_ user-facing change? Yes, the users will see faster Spark job resource allocation. The migration guide is updated correspondingly. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49681 from dongjoon-hyun/SPARK-50996. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
@@ -36,6 +36,8 @@ license: |
 
 - In Spark 4.0, support for Apache Mesos as a resource manager was removed.
 
+- Since Spark 4.0, Spark will allocate executor pods with a batch size of `10`. To restore the legacy behavior, you can set `spark.kubernetes.allocation.batch.size` to `5`.
+
 - Since Spark 4.0, Spark uses `ReadWriteOncePod` instead of `ReadWriteOnce` access mode in persistence volume claims. To restore the legacy behavior, you can set `spark.kubernetes.legacy.useReadWriteOnceAccessMode` to `true`.
 
 - Since Spark 4.0, Spark reports its executor pod status by checking all containers of that pod. To restore the legacy behavior, you can set `spark.kubernetes.executor.checkAllContainers` to `false`.
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
@@ -682,7 +682,7 @@ See the [configuration page](configuration.html) for information on Spark config
 </tr>
 <tr>
   <td><code>spark.kubernetes.allocation.batch.size</code></td>
-  <td><code>5</code></td>
+  <td><code>10</code></td>
   <td>
     Number of pods to launch at once in each round of executor pod allocation.
   </td>
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
@@ -450,7 +450,7 @@ private[spark] object Config extends Logging {
       .version("2.3.0")
       .intConf
       .checkValue(value => value > 0, "Allocation batch size should be a positive integer")
-      .createWithDefault(5)
+      .createWithDefault(10)
 
   val KUBERNETES_ALLOCATION_BATCH_DELAY =
     ConfigBuilder("spark.kubernetes.allocation.batch.delay")