Description
Describe the bug
When using load_dataset() from the datasets library (in load.py), specifying a particular split (e.g., split="train") still results in downloading data for all splits when streaming=False. This happens during the builder_instance.download_and_prepare() call.
This behavior leads to unnecessary bandwidth usage and longer download times, especially for large datasets, even if the user only intends to use a single split.
Steps to reproduce the bug
dataset_name = "skbose/indian-english-nptel-v0"
dataset = load_dataset(dataset_name, token=hf_token, split="test")
Expected behavior
Optimize the download logic so that only the required split is downloaded when streaming=False when a specific split is provided.
Environment info
Dataset: skbose/indian-english-nptel-v0
Platform: M1 Apple Silicon
Python verison: 3.12.9
datasets>=3.5.0