-
Notifications
You must be signed in to change notification settings - Fork 142
Open
Description
This is my current download script. Does this look right to you?
# 1. Download the 2017 train images and annotations from http://cocodataset.org/:
#You can use gsutil to download them to mscoco/:
#cd $DATASRC/mscoco/ mkdir -p train2017
#gsutil -m rsync gs://images.cocodataset.org/train2017 train2017
#gsutil -m cp gs://images.cocodataset.org/annotations/annotations_trainval2017.zip
#unzip annotations_trainval2017.zip
# Download Otherwise, you can download train2017.zip and annotations_trainval2017.zip and extract them into mscoco/. eta ~36m.
mkdir -p $MDS_DATA_PATH/mscoco
wget http://images.cocodataset.org/zips/train2017.zip -O $MDS_DATA_PATH/mscoco/train2017.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip
# both zips should be there, note: downloading zip takes some time
ls $MDS_DATA_PATH/mscoco/
# Extract them into mscoco/ (interpreting that as extracting both there, also due to how th gsutil command above looks like is doing)
# takes some time, but good progress display
unzip $MDS_DATA_PATH/mscoco/train2017.zip -d $MDS_DATA_PATH/mscoco
unzip $MDS_DATA_PATH/mscoco/annotations_trainval2017.zip -d $MDS_DATA_PATH/mscoco
# two folders should be there, annotations and train2017 stuff
ls $MDS_DATA_PATH/mscoco/
# check jpg imgs are there
ls $MDS_DATA_PATH/mscoco/train2017
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
ls $MDS_DATA_PATH/mscoco/annotations
ls $MDS_DATA_PATH/mscoco/annotations | grep -c .json
# move them since it says so in the natural language instructions ref for moving large # files: https://stackoverflow.com/a/75034830/1601580 thanks chatgpt!
find $MDS_DATA_PATH/mscoco/train2017 -type f -print0 | xargs -0 mv -t $MDS_DATA_PATH/mscoco
ls $MDS_DATA_PATH/mscoco/train2017 | grep -c .jpg
ls $MDS_DATA_PATH/mscoco | grep -c .jpg
mv $MDS_DATA_PATH/mscoco/annotations/* $MDS_DATA_PATH/mscoco/
ls $MDS_DATA_PATH/mscoco/ | grep -c .json
# 2. Launch the conversion script:
python -m meta_dataset.dataset_conversion.convert_datasets_to_records \
--dataset=mscoco \
--mscoco_data_root=$MDS_DATA_PATH/mscoco \
--splits_root=$SPLITS \
--records_root=$RECORDS
# 3. Expect the conversion to take about 4 hours.
# 4. Find the following outputs in $RECORDS/mscoco/:
#80 tfrecords files named [0-79].tfrecords
ls $RECORDS/mscoco/ | grep -c .tfrecords
#dataset_spec.json (see note 1)
ls $RECORDS/mscoco/dataset_spec.json
Metadata
Metadata
Assignees
Labels
No labels