Skip to content

Calling wr.s3.read_parquet_metadata with a path that doesn't exist throws IndexError #2842

Closed
@lucasmo

Description

@lucasmo

Describe the bug

This happens if the path passed to read_parquet_metadata doesn't exist:

[ERROR] IndexError: list index out of range
Traceback (most recent call last):
  File "/var/task/obscured/obscured.py", line 123, in do_a_thing
    column_types, _ = wr.s3.read_parquet_metadata(
  File "/opt/python/awswrangler/_config.py", line 715, in wrapper
    return function(**args)
  File "/opt/python/awswrangler/_utils.py", line 178, in inner
    return func(*args, **kwargs)
  File "/opt/python/awswrangler/s3/_read_parquet.py", line 846, in read_parquet_metadata
    columns_types, partitions_types, _ = _read_parquet_metadata(
  File "/opt/python/awswrangler/s3/_read_parquet.py", line 140, in _read_parquet_metadata
    return reader.read_table_metadata(
  File "/opt/python/awswrangler/s3/_read.py", line 280, in read_table_metadata
    merged_schemas = _validate_schemas(schemas=schemas, validate_schema=False)
  File "/opt/python/awswrangler/s3/_read.py", line 304, in _validate_schemas
    first: pa.schema = schemas[0]

How to Reproduce

awswrangler.s3.read_parquet_metadata(path='s3://bucket-you-can-read/file-that-doesnt-exist')

Expected behavior

An exception like exceptions.NoFilesFound is thrown, or perhaps some kind of empty result? It's unclear what the correct behavior here should be, but it's not throwing an IndexError :)

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.11

AWS SDK for pandas version

3.7.3

Additional context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions