@@ -408,6 +408,8 @@ All model architecture families include variants with pretrained weights. There
408
408
409
409
* Aggregating Nested Transformers - https://arxiv.org/abs/2105.12723
410
410
* BEiT - https://arxiv.org/abs/2106.08254
411
+ * BEiT-V2 - https://arxiv.org/abs/2208.06366
412
+ * BEiT3 - https://arxiv.org/abs/2208.10442
411
413
* Big Transfer ResNetV2 (BiT) - https://arxiv.org/abs/1912.11370
412
414
* Bottleneck Transformers - https://arxiv.org/abs/2101.11605
413
415
* CaiT (Class-Attention in Image Transformers) - https://arxiv.org/abs/2103.17239
@@ -424,6 +426,7 @@ All model architecture families include variants with pretrained weights. There
424
426
* DPN (Dual-Path Network) - https://arxiv.org/abs/1707.01629
425
427
* EdgeNeXt - https://arxiv.org/abs/2206.10589
426
428
* EfficientFormer - https://arxiv.org/abs/2206.01191
429
+ * EfficientFormer-V2 - https://arxiv.org/abs/2212.08059
427
430
* EfficientNet (MBConvNet Family)
428
431
* EfficientNet NoisyStudent (B0-B7, L2) - https://arxiv.org/abs/1911.04252
429
432
* EfficientNet AdvProp (B0-B8) - https://arxiv.org/abs/1911.09665
@@ -440,12 +443,14 @@ All model architecture families include variants with pretrained weights. There
440
443
* EfficientViT (MSRA) - https://arxiv.org/abs/2305.07027
441
444
* EVA - https://arxiv.org/abs/2211.07636
442
445
* EVA-02 - https://arxiv.org/abs/2303.11331
446
+ * FasterNet - https://arxiv.org/abs/2303.03667
443
447
* FastViT - https://arxiv.org/abs/2303.14189
444
448
* FlexiViT - https://arxiv.org/abs/2212.08013
445
449
* FocalNet (Focal Modulation Networks) - https://arxiv.org/abs/2203.11926
446
450
* GCViT (Global Context Vision Transformer) - https://arxiv.org/abs/2206.09959
447
451
* GhostNet - https://arxiv.org/abs/1911.11907
448
452
* GhostNet-V2 - https://arxiv.org/abs/2211.12905
453
+ * GhostNet-V3 - https://arxiv.org/abs/2404.11202
449
454
* gMLP - https://arxiv.org/abs/2105.08050
450
455
* GPU-Efficient Networks - https://arxiv.org/abs/2006.14090
451
456
* Halo Nets - https://arxiv.org/abs/2103.12731
@@ -501,14 +506,19 @@ All model architecture families include variants with pretrained weights. There
501
506
* SelecSLS - https://arxiv.org/abs/1907.00837
502
507
* Selective Kernel Networks - https://arxiv.org/abs/1903.06586
503
508
* Sequencer2D - https://arxiv.org/abs/2205.01972
509
+ * SHViT - https://arxiv.org/abs/2401.16456
504
510
* SigLIP (image encoder) - https://arxiv.org/abs/2303.15343
505
511
* SigLIP 2 (image encoder) - https://arxiv.org/abs/2502.14786
512
+ * StarNet - https://arxiv.org/abs/2403.19967
513
+ * SwiftFormer - https://arxiv.org/pdf/2303.15446
506
514
* Swin S3 (AutoFormerV2) - https://arxiv.org/abs/2111.14725
507
515
* Swin Transformer - https://arxiv.org/abs/2103.14030
508
516
* Swin Transformer V2 - https://arxiv.org/abs/2111.09883
517
+ * TinyViT - https://arxiv.org/abs/2207.10666
509
518
* Transformer-iN-Transformer (TNT) - https://arxiv.org/abs/2103.00112
510
519
* TResNet - https://arxiv.org/abs/2003.13630
511
520
* Twins (Spatial Attention in Vision Transformers) - https://arxiv.org/pdf/2104.13840.pdf
521
+ * VGG - https://arxiv.org/abs/1409.1556
512
522
* Visformer - https://arxiv.org/abs/2104.12533
513
523
* Vision Transformer - https://arxiv.org/abs/2010.11929
514
524
* ViTamin - https://arxiv.org/abs/2404.02132
0 commit comments