-
Notifications
You must be signed in to change notification settings - Fork 142
Open
Description
I tried running but got error:
(mds_env_gpu) brando9~/data/mds/mscoco $ gsutil -m rsync gs://images.cocodataset.org/train2017 train2017
BucketNotFoundException: 404 gs://images.cocodataset.org bucket does not exist.
what to do?
full attempt:
# 1. Download the 2017 train images and annotations from http://cocodataset.org/:
#You can use gsutil to download them to mscoco/:
mkdir -p $MDS_DATA_PATH/mscoco/
cd $MDS_DATA_PATH/mscoco/
mkdir -p train2017
# seems to directly download all files, no zip file needed
gsutil -m rsync gs://images.cocodataset.org/train2017 train2017
# todo should have 118287? number of .jpg files (note no unziping needed)
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
# download & extract annotations_trainval2017.zip
gsutil -m cp gs://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
# todo says: 6?
ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
## Download Otherwise, you can download train2017.zip and annotations_trainval2017.zip and extract them into mscoco/. eta ~36m.
#mkdir -p $MDS_DATA_PATH/mscoco
#wget http://images.cocodataset.org/zips/train2017.zip -O $MDS_DATA_PATH/mscoco/train2017.zip
#wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip
## both zips should be there, note: downloading zip takes some time
#ls $MDS_DATA_PATH/mscoco/
## Extract them into mscoco/ (interpreting that as extracting both there, also due to how th gsutil command above looks like is doing)
## takes some time, but good progress display
#unzip $MDS_DATA_PATH/mscoco/train2017.zip -d $MDS_DATA_PATH/mscoco
#unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
## two folders should be there, annotations and train2017 stuff
#ls $MDS_DATA_PATH/mscoco/
## check jpg imgs are there
#ls $MDS_DATA_PATH/mscoco/train2017
#ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
## says: 118287 for a 2nd time
#ls $MDS_DATA_PATH/mscoco/annotations
#ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
## says: 6 for a 2nd time
## move them since it says so in the google NL instructions ref: for moving large num files https://stackoverflow.com/a/75034830/1601580 thanks chatgpt!
#ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
#find $MDS_DATA_PATH/mscoco/train2017 -type f -print0 | xargs -0 mv -t $MDS_DATA_PATH/mscoco
#ls $MDS_DATA_PATH/mscoco | grep -c .jpg
## says: 118287 for both
#ls $MDS_DATA_PATH/mscoco/annotations/ | grep -c .json
#mv $MDS_DATA_PATH/mscoco/annotations/* $MDS_DATA_PATH/mscoco/
#ls $MDS_DATA_PATH/mscoco/ | grep -c .json
## says: 6 for both
# 2. Launch the conversion script:
python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
--dataset=mscoco \
--mscoco_data_root=$MDS_DATA_PATH/mscoco \
--splits_root=$SPLITS \
--records_root=$RECORDS
# 3. Expect the conversion to take about 4 hours.
# 4. Find the following outputs in $RECORDS/mscoco/:
#80 tfrecords files named [0-79].tfrecords
ls $RECORDS/mscoco/ | grep -c .tfrecords
#dataset_spec.json (see note 1)
ls $RECORDS/mscoco/dataset_spec.json
related: brando90/pytorch-meta-dataset#20
rishi1999
Metadata
Metadata
Assignees
Labels
No labels