insufficient train.py with specific data folder #2531

JoJoistheBestOne · 2025-07-02T09:16:09Z

JoJoistheBestOne
Jul 2, 2025

Currently, the system requires a data folder, with 'train' and 'val' subfolders, and then class subfolders within those. This is highly inefficient because my dataset is very large. Furthermore, I'm not just using this data to train a classification model; I need to train other models as well. This rigid structure makes it very inconvenient for other models to utilize the data.

Due to the sheer size of the dataset, I can't even create symbolic links because it would exceed the inode limit.

Does this repository support inputting train.csv and val.csv? Why is it designed to be so difficult to use?

rwightman · 2025-07-02T17:56:35Z

rwightman
Jul 2, 2025
Maintainer

use webdataset or tfds if you need scale, tsv/csv datasets don't really scale any better than folders and there is too much variation in schemas to support universally

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

insufficient train.py with specific data folder #2531

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

insufficient train.py with specific data folder #2531

Uh oh!

JoJoistheBestOne Jul 2, 2025

Replies: 1 comment

Uh oh!

Uh oh!

rwightman Jul 2, 2025 Maintainer

JoJoistheBestOne
Jul 2, 2025

rwightman
Jul 2, 2025
Maintainer