-
Notifications
You must be signed in to change notification settings - Fork 35
Description
s3_list_keys() returns empty results despite objects existing in S3
Description
s3_list_keys() returns an empty array when listing objects under certain prefixes, even though the objects exist and can be retrieved with s3_get(). The underlying S3 API (S3.list_objects_v2) correctly returns the objects with KeyCount=1, but s3_list_keys() fails to extract them.
Environment
- Julia version: 1.11.7
- AWSS3.jl version: v0.11.4
- AWS.jl version: v1.96.0
- OS: macOS
Minimal Reproducible Example
using AWSS3
using AWS
using AWS: @service
@service S3
aws = global_aws_config(; region="us-east-1")
bucket = "my-bucket"
prefix = "path/to/partition/marketplace_id=1/order_day=2025-09-01"
# This file exists and can be retrieved
key = "path/to/partition/marketplace_id=1/order_day=2025-09-01/part-00000.snappy.parquet"
parquet_object = s3_get(aws, bucket, key) # ✅ Works fine
# But listing returns empty
keys = collect(s3_list_keys(aws, bucket, prefix))
println("Found $(length(keys)) keys") # ❌ Returns: Found 0 keys
# However, the raw S3 API shows the object exists
response = S3.list_objects_v2(bucket, Dict("prefix" => prefix, "max-keys" => "10"))
println(response)
# ✅ Returns: KeyCount => "1", Contents => {...}Expected Behavior
s3_list_keys() should return the keys that exist under the specified prefix, matching what the underlying S3 API returns.
Actual Behavior
s3_list_keys() returns an empty array, even though:
- The S3 API response shows
KeyCount=1with validContents s3_get()successfully retrieves the object using the full keys3_list_objects()returns a Channel that contains the object metadata when iterated
Diagnostic Output
julia> keys = collect(s3_list_keys(aws, bucket, prefix))
Any[]
julia> response = S3.list_objects_v2(bucket, Dict("prefix" => prefix, "max-keys" => "10"))
OrderedCollections.LittleDict{Union{String, Symbol}, Any, Vector{Union{String, Symbol}}, Vector{Any}} with 6 entries:
"Name" => "my-bucket"
"Prefix" => "path/to/partition/marketplace_id=1/order_day=2025-09-01/"
"KeyCount" => "1"
"MaxKeys" => "10"
"IsTruncated" => "false"
"Contents" => LittleDict{Union{String, Symbol}, Any, Vector{Union{String, Symbol}}, Vector{Any}}("Key"=>"path/to/partition/marketplace_id=1/order_day=2025-09-01/part-00000.snappy.parquet", ...)Additional Context
Both s3_list_keys() and s3_list_objects() return empty results, even though the underlying S3 API confirms the objects exist:
julia> objects_channel = s3_list_objects(aws, bucket, prefix)
Channel{OrderedCollections.LittleDict}(128) (empty)
julia> keys = collect(s3_list_keys(aws, bucket, prefix))
Any[]Workaround
Use the raw S3 API directly:
using AWS: @service
@service S3
response = S3.list_objects_v2(bucket, Dict("prefix" => prefix))
if haskey(response, "Contents")
contents = response["Contents"]
keys = isa(contents, Vector) ? [c["Key"] for c in contents] : [contents["Key"]]
endPossible Cause
The issue may be related to how s3_list_keys() and s3_list_objects() parse the S3 API response when dealing with specific prefix patterns or when Contents contains a single object versus an array of objects. The Channel is created but never populated with data from the API response.
I will look into it more and try to push a bug fix when I get some down time but wanted to document the issue and flag it.