-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Describe the bug
The Spark connector for Azure Cosmos DB (azure-cosmos-spark_3-1_2-12) creates an increasing number of Spark tasks over time, even when:
The Cosmos container has a fixed number of physical partitions (75)
The partitioning strategy is set to Custom
The number of partitions is explicitly capped using
spark.cosmos.partitioning.targetedCount = 75
Despite this configuration, the number of Spark tasks continues to grow between job runs (e.g., 75 → 150 → 300...), even when nothing else has changed. This results in performance issues and unpredictable scaling.
Exception or Stack Trace
There is no exception. However, the Spark UI shows increasing task counts:
Stage 0 (first run): 75 tasks
Stage 1 (second run): 150 tasks
Stage 2 (third run): 300 tasks
...
This occurs without repartitioning or shuffles in the job.
To Reproduce
Use a Cosmos DB container with 75 physical partitions
Run the following Spark job repeatedly (or across multiple tenants)
Observe the increasing number of tasks in the Spark UI over time
val df = spark.read
.format("cosmos.oltp")
.option("spark.cosmos.accountEndpoint", "<>")
.option("spark.cosmos.accountKey", "<>")
.option("spark.cosmos.database", "<>")
.option("spark.cosmos.container", "<>")
.option("spark.cosmos.read.partitioning.strategy", "Custom")
.option("spark.cosmos.partitioning.targetedCount", "75")
.load()
println(s"Spark partition count: ${df.rdd.getNumPartitions}")
This consistently prints more partitions over time, even though the Cosmos container's physical partition count stays at 75.
Expected behavior
Spark should consistently generate 75 input partitions, matching the spark.cosmos.partitioning.targetedCount setting. This number should remain constant unless the Cosmos container's physical partition count changes or the configuration is explicitly updated.
Screenshots
Attach Spark UI screenshots showing increasing task counts across stages and runs, if possible.
Setup (please complete the following information):
OS: Ubuntu 20.04 (also tested in Databricks Runtime)
IDE: IntelliJ IDEA (optional)
Library: com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.37.1
Java version: OpenJDK 11
Environment: Apache Spark 3.1.2 on Databricks Runtime 10.x
Frameworks: Plain Spark batch job (no streaming)
Additional Notes
The Cosmos container has 75 physical partitions — verified in Azure Portal.
There are no .repartition(), .groupBy(), or union operations applied.
.coalesce(75) has been tested as a workaround, confirming the issue is in the connector’s partition management logic.
Using strategy Restrictive also leads to similar task count inflation.
This behavior breaks expected scaling guarantees and causes resource waste.