|
| 1 | +# SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization |
| 2 | + |
| 3 | +This is the official code for SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization (ECCV 2024). |
| 4 | + |
| 5 | + |
| 6 | +## Dependencies |
| 7 | +- Python 3.8.17 |
| 8 | +- torch 1.9.0 |
| 9 | +- torchvision 0.10.0 |
| 10 | +- timm 0.5.4 |
| 11 | +Run `pip install -r requirement.txt` to install all requrements. |
| 12 | + |
| 13 | + |
| 14 | +## Directories |
| 15 | + |
| 16 | +- `auto_LiRPA`: Contains the logger and `MultiAverageMeter`. |
| 17 | +- `model_for_cifar`: Vanilla ViT variant models for CIFAR-10 and CIFAR-100 experiments. |
| 18 | +- `model_for_cifar_sn`: SpecFormer models for CIFAR-10 and CIFAR-100 experiments. |
| 19 | +- `model_for_imagenet`: Vanilla ViT variant models for ImageNet and Imagenette experiments. |
| 20 | +- `model_for_imagenet_sn`: SpecFormer models for ImageNet and Imagenette experiments. |
| 21 | +- `parser`: Python scripts for retrieving input parameters from the command line. |
| 22 | + - `parser_cifar.py`: Parser for CIFAR experiments. |
| 23 | + - `parser_imagenet.py`: Parser for ImageNet experiments. |
| 24 | + - `parser_imagenette.py`: Parser for Imagenette experiments. |
| 25 | +- `robust_evaluate`: Python scripts for evaluating robustness. |
| 26 | + - `aa.py`: Evaluates AutoAttack. |
| 27 | + - `fgsm.py`: Evaluates FGSM Attack. |
| 28 | + - `pgd.py`: Evaluates PGD Attack. |
| 29 | +- `train`: Python scripts for training models. |
| 30 | + - `train_cifar.py`: Training script for CIFAR experiments. |
| 31 | + - `train_imagenet.py`: Training script for ImageNet experiments. |
| 32 | + - `train_imagenette.py`: Training script for Imagenette experiments. |
| 33 | + - `utils.py`: Contains the data loading code. |
| 34 | + |
| 35 | +这样描述更加具体,并且语法准确。 |
| 36 | +## Data |
| 37 | + |
| 38 | +- **CIFAR-10 and CIFAR-100**: These datasets will be automatically downloaded when running `train_cifar` using `datasets.CIFAR10(args.data_dir, train=True, transform=train_transform, download=True)`. |
| 39 | +- **ImageNet**: The ImageNet dataset can be downloaded from [ImageNet](https://www.image-net.org/download.php). |
| 40 | +- **Imagenette-v1**: The Imagenette-v1 dataset can be downloaded from [Imagenette-v1](https://s3.amazonaws.com/fast-ai-imageclas/imagenette.tgz). |
| 41 | + |
| 42 | + |
| 43 | +## Running |
| 44 | + |
| 45 | +### CIFAR-10/100 |
| 46 | +```python |
| 47 | +CUDA_VISIBLE_DEVICES=0 python -m train.train_cifar --model "vit_small_patch16_224_sn" --dataset cifar10 --out-dir "/log/" --method 'CLEAN' --seed 0 --epochs 40 --data-dir /data/cifar --pen-for-qkv 1e-5 1e-5 1e-5 |
| 48 | + |
| 49 | +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train.train_cifar --model "vit_base_patch16_224_sn" --dataset cifar100 --out-dir "/log/" --method 'AT' --seed 0 --epochs 40 --data-dir /data/cifar --pen-for-qkv 1e-5 1e-5 1e-5 |
| 50 | +``` |
| 51 | + |
| 52 | +You can switch to other ViT variants using the `--model` option, change the dataset with the `--dataset` option, and select a different method with the `--method` option. Additionally, you can adjust the penalizing strength using the `--pen-for-qkv` option, where the first value penalizes the query matrix, the second value penalizes the key matrices, and the third value penalizes the value matrices. |
| 53 | +
|
| 54 | +### ImageNet |
| 55 | +```python |
| 56 | +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train.train_imagenet --model "vit_base_patch16_224_in21k_sn" --batch-size-eval 128 --AA-batch 128 --out-dir "/log/" --method 'CLEAN' --seed 0 --data-dir /data/imagenet/ImageNet/ --pen-for-qkv 5e-3 6e-4 7e-5 |
| 57 | +
|
| 58 | +``` |
| 59 | +
|
| 60 | +### ImageNette |
| 61 | +```python |
| 62 | +CUDA_VISIBLE_DEVICES=0,1,2,3 python -m train.train_imagenette --model "deit_small_patch16_224_sn" --out-dir "/log/" --method 'CLEAN' --seed 0 --epochs 40 --data-dir /data/imagenette/ --pen-for-qkv 1e-5 1e-5 1e-5 |
| 63 | +``` |
| 64 | +
|
| 65 | +
|
| 66 | +## Acknowlegements |
| 67 | +This repository is built upon the following repositories: |
| 68 | +- [When-Adversarial-Training-Meets-Vision-Transformers](https://github.com/mo666666/When-Adversarial-Training-Meets-Vision-Transformers) |
| 69 | +- [LipsFormer](https://github.com/IDEA-Research/LipsFormer) |
| 70 | +- [pytorch-image-models](https://github.com/rwightman/pytorch-image-models) |
| 71 | +- [vits-robustness-torch](https://github.com/dedeswim/vits-robustness-torch) |
| 72 | +
|
| 73 | +
|
| 74 | +
|
| 75 | +
|
| 76 | +## Cite this work |
| 77 | +If you find our code is useful, please cite our paper! |
| 78 | +```bibtex |
| 79 | +@inproceedings{hu2024specformer, |
| 80 | + title={SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization}, |
| 81 | + author={Hu, Xixu and Zheng, Runkai and Wang, Jindong and Leung, Cheukhang and Wu, Qi and Xie, Xing}, |
| 82 | + booktitle={European Conference on Computer Vision}, |
| 83 | + year={2024}, |
| 84 | + organization={Springer} |
| 85 | +} |
| 86 | +``` |
0 commit comments