Skip to content

Loading Data From S3 Path in Sagemaker #878

@mahesh1amour

Description

@mahesh1amour

In Sagemaker Im tring to load the data set from S3 path as follows

`train_path = 's3://xxxxxxxxxx/xxxxxxxxxx/train.csv'
valid_path = 's3://xxxxxxxxxx/xxxxxxxxxx/validation.csv'
test_path = 's3://xxxxxxxxxx/xxxxxxxxxx/test.csv'

data_files = {}
data_files["train"] = train_path
data_files["validation"] = valid_path
data_files["test"] = test_path
extension = train_path.split(".")[-1]
datasets = load_dataset(extension, data_files=data_files, s3_enabled=True)
print(datasets)`

I getting an error of

algo-1-7plil_1 | File "main.py", line 21, in <module> algo-1-7plil_1 | datasets = load_dataset(extension, data_files=data_files) algo-1-7plil_1 | File "/opt/conda/lib/python3.6/site-packages/datasets/load.py", line 603, in load_dataset algo-1-7plil_1 | **config_kwargs, algo-1-7plil_1 | File "/opt/conda/lib/python3.6/site-packages/datasets/builder.py", line 155, in __init__ algo-1-7plil_1 | **config_kwargs, algo-1-7plil_1 | File "/opt/conda/lib/python3.6/site-packages/datasets/builder.py", line 305, in _create_builder_config algo-1-7plil_1 | m.update(str(os.path.getmtime(data_file))) algo-1-7plil_1 | File "/opt/conda/lib/python3.6/genericpath.py", line 55, in getmtime algo-1-7plil_1 | return os.stat(filename).st_mtime algo-1-7plil_1 | FileNotFoundError: [Errno 2] No such file or directory: 's3://lsmv-sagemaker/pubmedbert/test.csv

But when im trying with pandas , it is able to load from S3

Does the datasets library support S3 path to load

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions