You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(io): add connection pool semaphore to Azure Blob Storage backend (#6305)
Changes Made
Add a connection pool semaphore to AzureBlobSource, matching the
existing pattern in S3/GCS/TOS backends. Azure was the only major
backend missing this, causing
unbounded concurrent connections (400-800+) when reading multiple large
parquet files in parallel.
- Add max_connections_per_io_thread (default 8) to AzureConfig
- Add connection_pool_sema to AzureBlobSource with permit lifecycle tied
to GetResult::Stream
- Extract get_size_internal() to avoid deadlock for GetRange::Suffix
(Azure SDK doesn't support native suffix ranges)
- Update Python bindings, type stubs, SQL config
Related Issues
Closes#6279
Copy file name to clipboardExpand all lines: src/common/io-config/src/python.rs
+17-2Lines changed: 17 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -92,6 +92,7 @@ pub struct S3Credentials {
92
92
/// anonymous (bool, optional): Whether or not to use "anonymous mode", which will access Azure without any credentials
93
93
/// endpoint_url (str, optional): Custom URL to the Azure endpoint, e.g. ``https://my-account-name.blob.core.windows.net``. Overrides `use_fabric_endpoint` if set
94
94
/// use_ssl (bool, optional): Whether or not to use SSL, which require accessing Azure over HTTPS rather than HTTP, defaults to True
95
+
/// max_connections (int, optional): Maximum number of connections to Azure at any time per io thread, defaults to 8
0 commit comments