[BUG][Spark] [Cosmos-Changefeed Connector]

Describe the bug
The Spark connector for Azure Cosmos DB (azure-cosmos-spark_3-1_2-12) creates an increasing number of Spark tasks over time, even when:

The Cosmos container has a fixed number of physical partitions (75)

The partitioning strategy is set to Custom

The number of partitions is explicitly capped using
spark.cosmos.partitioning.targetedCount = 75

Despite this configuration, the number of Spark tasks continues to grow between job runs (e.g., 75 → 150 → 300...), even when nothing else has changed. This results in performance issues and unpredictable scaling.

Exception or Stack Trace
There is no exception. However, the Spark UI shows increasing task counts:

Stage 0 (first run): 75 tasks  
Stage 1 (second run): 150 tasks  
Stage 2 (third run): 300 tasks  
...
This occurs without repartitioning or shuffles in the job.

To Reproduce
Use a Cosmos DB container with 75 physical partitions

Run the following Spark job repeatedly (or across multiple tenants)

Observe the increasing number of tasks in the Spark UI over time

```scala
val df = spark.read
  .format("cosmos.oltp")
  .option("spark.cosmos.accountEndpoint", "<>")
  .option("spark.cosmos.accountKey", "<>")
  .option("spark.cosmos.database", "<>")
  .option("spark.cosmos.container", "<>")
  .option("spark.cosmos.read.partitioning.strategy", "Custom")
  .option("spark.cosmos.partitioning.targetedCount", "75")
  .load()
```

println(s"Spark partition count: ${df.rdd.getNumPartitions}")
This consistently prints more partitions over time, even though the Cosmos container's physical partition count stays at 75.

Expected behavior
Spark should consistently generate 75 input partitions, matching the spark.cosmos.partitioning.targetedCount setting. This number should remain constant unless the Cosmos container's physical partition count changes or the configuration is explicitly updated.

Screenshots
Attach Spark UI screenshots showing increasing task counts across stages and runs, if possible.

Setup (please complete the following information):
OS: Ubuntu 20.04 (also tested in Databricks Runtime)

IDE: IntelliJ IDEA (optional)

Library: com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.37.1

Java version: OpenJDK 11

Environment: Apache Spark 3.1.2 on Databricks Runtime 10.x

Frameworks: Plain Spark batch job (no streaming)

Additional Notes
The Cosmos container has 75 physical partitions — verified in Azure Portal.

There are no .repartition(), .groupBy(), or union operations applied.

.coalesce(75) has been tested as a workaround, confirming the issue is in the connector’s partition management logic.

Using strategy Restrictive also leads to similar task count inflation.

This behavior breaks expected scaling guarantees and causes resource waste.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG][Spark] [Cosmos-Changefeed Connector] #46074

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG][Spark] [Cosmos-Changefeed Connector] #46074

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions