Hi there 👋
In the tutorial tutorials/pretrain_redpajama.md it's said that you can download full-size and sample-size RedPajama dataset with help of git lfs.
At least as of right now, it's possible only for sample dataset.
On HF page for the sample dataset, you can find the list of lfs files: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample/tree/main
But not for full-size version: https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T/tree/main
For the full-size variant, only URLs are provided.