fix(io): add connection pool semaphore to Azure Blob Storage backend#6305
Merged
rohitkulshreshtha merged 5 commits intoEventual-Inc:mainfrom Mar 2, 2026
Conversation
Azure was the only major cloud storage backend missing a connection pool semaphore, causing unbounded concurrent connections when reading multiple large parquet files in parallel (8 files × 50-100 ranges = 400-800+ connections), leading to Azure server-side connection resets. Add `max_connections_per_io_thread` (default 8) to AzureConfig and `connection_pool_sema` to AzureBlobSource, matching the pattern used by S3, GCS, and TOS backends. Extract `get_size_internal()` to avoid deadlock when `get()` handles GetRange::Suffix requests.
Add missing max_connections parameter documentation to AzureConfig docstring for parity with S3Config and GCSConfig. Clamp max_connections_per_io_thread to minimum 1 to prevent deadlock when user passes max_connections=0.
Contributor
Greptile SummaryThis PR adds connection pool management to the Azure Blob Storage backend, limiting concurrent connections to prevent resource exhaustion when reading multiple large files in parallel. Key changes:
The implementation follows the established pattern from S3/GCS/TOS backends and includes comprehensive test coverage. This resolves the unbounded connection issue described in #6279. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant AzureBlobSource
participant Semaphore as Connection Pool<br/>Semaphore
participant Azure as Azure Blob<br/>Storage
Note over AzureBlobSource,Semaphore: Initialization (max 8 × num_threads)
AzureBlobSource->>Semaphore: Create with capacity
Note over Client,Azure: get() operation
Client->>AzureBlobSource: get(uri, range)
AzureBlobSource->>Semaphore: acquire_owned()
Semaphore-->>AzureBlobSource: permit
alt GetRange::Suffix
AzureBlobSource->>AzureBlobSource: get_size_internal()<br/>(no semaphore)
AzureBlobSource->>Azure: get_properties()
Azure-->>AzureBlobSource: file size
AzureBlobSource->>AzureBlobSource: calculate range
end
AzureBlobSource->>Azure: download with range
Azure-->>AzureBlobSource: byte stream
AzureBlobSource-->>Client: GetResult::Stream<br/>(with permit)
Note over Client,Semaphore: Permit released when<br/>stream is dropped
Note over Client,Azure: get_size() operation
Client->>AzureBlobSource: get_size(uri)
AzureBlobSource->>Semaphore: acquire()
Semaphore-->>AzureBlobSource: permit guard
AzureBlobSource->>AzureBlobSource: get_size_internal()
AzureBlobSource->>Azure: get_properties()
Azure-->>AzureBlobSource: file size
AzureBlobSource-->>Client: size
Note over AzureBlobSource,Semaphore: Permit released<br/>when guard drops
Last reviewed commit: 20583c3 |
Add missing max_connections_per_io_thread field to the AzureConfig section of the expected explain output string.
universalmind303
approved these changes
Feb 27, 2026
Member
universalmind303
left a comment
There was a problem hiding this comment.
LGTM, thanks @singularityDLW!
|
Great! Thanks a lot @singularityDLW |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes Made
Add a connection pool semaphore to AzureBlobSource, matching the existing pattern in S3/GCS/TOS backends. Azure was the only major backend missing this, causing
unbounded concurrent connections (400-800+) when reading multiple large parquet files in parallel.
Related Issues
Closes #6279