|
4 | 4 |
|
5 | 5 | | MODEL FAMILY | MODEL NAME (Huggingface hub) | FP32 | BF16 | Static quantization INT8 | Weight only quantization INT8 | Weight only quantization INT4 | |
6 | 6 | |:---:|:---:|:---:|:---:|:---:|:---:|:---:| |
7 | | -|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
8 | | -|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
9 | | -|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
10 | | -|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
11 | | -|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
12 | | -|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
13 | | -|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
14 | | -|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 | | 🟩 | | |
15 | | -|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
16 | | -|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
17 | | -|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
18 | | -|FALCON| tiiuae/falcon-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
19 | | -|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
20 | | -|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
21 | | -|OPT| facebook/opt-30b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
22 | | -|OPT| facebook/opt-1.3b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
23 | | -|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
24 | | -|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
25 | | -|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
26 | | -|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
27 | | -|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
28 | | -|ChatGLM| THUDM/chatglm3-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
29 | | -|ChatGLM| THUDM/chatglm2-6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
30 | | -|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
31 | | -|T5| google/flan-t5-xl | 🟩 | 🟩 | 🟩 | 🟩 | | |
32 | | -|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
33 | | -|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
34 | | -|Mixtral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 | | 🟩 | 🟩 | |
35 | | -|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
36 | | -|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
37 | | -|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
38 | | -|LLaVA| liuhaotian/llava-v1.5-7b | 🟩 | 🟩 | | 🟩 | 🟩 | |
39 | | -|GIT| microsoft/git-base | 🟩 | 🟩 | | 🟩 | | |
40 | | -|Yuan| IEITYuan/Yuan2-102B-hf | 🟩 | 🟩 | | 🟩 | | |
41 | | -|Phi| microsoft/phi-2 | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
42 | | -|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
43 | | -|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
44 | | -|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
45 | | -|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | 🟩 | 🟩 | 🟩 | |
46 | | -|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | 🟩 | 🟩 | | |
47 | | -|Maira| microsoft/maira-2 | 🟩 | 🟩 | | 🟩 | | |
48 | | -|Jamba| ai21labs/Jamba-v0.1 | 🟩 | 🟩 | | 🟩 | | |
| 7 | +|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 8 | +|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 9 | +|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 10 | +|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 11 | +|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 12 | +|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 13 | +|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 14 | +|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ | | ✅ | ✅ | |
| 15 | +|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 16 | +|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 17 | +|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 18 | +|FALCON| tiiuae/falcon-7b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 19 | +|FALCON| tiiuae/falcon-11b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 20 | +|FALCON| tiiuae/falcon-40b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 21 | +|OPT| facebook/opt-30b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 22 | +|OPT| facebook/opt-1.3b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 23 | +|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 24 | +|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 25 | +|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 26 | +|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 27 | +|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 28 | +|ChatGLM| THUDM/chatglm3-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 29 | +|ChatGLM| THUDM/chatglm2-6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 30 | +|GPTBigCode| bigcode/starcoder | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 31 | +|T5| google/flan-t5-xl | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 32 | +|MPT| mosaicml/mpt-7b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 33 | +|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 34 | +|Mixtral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ | | ✅ | ✅ | |
| 35 | +|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 36 | +|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 37 | +|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 38 | +|LLaVA| liuhaotian/llava-v1.5-7b | ✅ | ✅ | | ✅ | ✅ | |
| 39 | +|GIT| microsoft/git-base | ✅ | ✅ | | ✅ | ✅ | |
| 40 | +|Yuan| IEITYuan/Yuan2-102B-hf | ✅ | ✅ | | ✅ | | |
| 41 | +|Phi| microsoft/phi-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 42 | +|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 43 | +|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 44 | +|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 45 | +|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 46 | +|Whisper| openai/whisper-large-v2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| 47 | +|Maira| microsoft/maira-2 | ✅ | ✅ | | ✅ | ✅ | |
| 48 | +|Jamba| ai21labs/Jamba-v0.1 | ✅ | ✅ | | ✅ | ✅ | |
| 49 | +|DeepSeek| deepseek-ai/DeepSeek-V2.5-1210 | ✅ | ✅ | | ✅ | ✅ | |
49 | 50 |
|
50 | 51 | ## 1.2 Verified for distributed inference mode via DeepSpeed |
51 | 52 |
|
52 | 53 | | MODEL FAMILY | MODEL NAME (Huggingface hub) | BF16 | Weight only quantization INT8 | |
53 | 54 | |:---:|:---:|:---:|:---:| |
54 | | -|LLAMA| meta-llama/Llama-2-7b-hf | 🟩 | 🟩 | |
55 | | -|LLAMA| meta-llama/Llama-2-13b-hf | 🟩 | 🟩 | |
56 | | -|LLAMA| meta-llama/Llama-2-70b-hf | 🟩 | 🟩 | |
57 | | -|LLAMA| meta-llama/Meta-Llama-3-8B | 🟩 | 🟩 | |
58 | | -|LLAMA| meta-llama/Meta-Llama-3-70B | 🟩 | 🟩 | |
59 | | -|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | 🟩 | 🟩 | |
60 | | -|LLAMA| meta-llama/Llama-3.2-3B-Instruct | 🟩 | 🟩 | |
61 | | -|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | 🟩 | 🟩 | |
62 | | -|GPT-J| EleutherAI/gpt-j-6b | 🟩 | 🟩 | |
63 | | -|GPT-NEOX| EleutherAI/gpt-neox-20b | 🟩 | 🟩 | |
64 | | -|DOLLY| databricks/dolly-v2-12b | 🟩 | 🟩 | |
65 | | -|FALCON| tiiuae/falcon-11b | 🟩 | 🟩 | |
66 | | -|FALCON| tiiuae/falcon-40b | 🟩 | 🟩 | |
67 | | -|OPT| facebook/opt-30b | 🟩 | 🟩 | |
68 | | -|OPT| facebook/opt-1.3b | 🟩 | 🟩 | |
69 | | -|Bloom| bigscience/bloom-1b7 | 🟩 | 🟩 | |
70 | | -|CodeGen| Salesforce/codegen-2B-multi | 🟩 | 🟩 | |
71 | | -|Baichuan| baichuan-inc/Baichuan2-7B-Chat | 🟩 | 🟩 | |
72 | | -|Baichuan| baichuan-inc/Baichuan2-13B-Chat | 🟩 | 🟩 | |
73 | | -|Baichuan| baichuan-inc/Baichuan-13B-Chat | 🟩 | 🟩 | |
74 | | -|GPTBigCode| bigcode/starcoder | 🟩 | 🟩 | |
75 | | -|T5| google/flan-t5-xl | 🟩 | 🟩 | |
76 | | -|Mistral| mistralai/Mistral-7B-v0.1 | 🟩 | 🟩 | |
77 | | -|Mistral| mistralai/Mixtral-8x7B-v0.1 | 🟩 | 🟩 | |
78 | | -|MPT| mosaicml/mpt-7b | 🟩 | 🟩 | |
79 | | -|Stablelm| stabilityai/stablelm-2-1_6b | 🟩 | 🟩 | |
80 | | -|Qwen| Qwen/Qwen-7B-Chat | 🟩 | 🟩 | |
81 | | -|Qwen| Qwen/Qwen2-7B | 🟩 | 🟩 | |
82 | | -|GIT| microsoft/git-base | 🟩 | 🟩 | |
83 | | -|Phi| microsoft/phi-2 | 🟩 | 🟩 | |
84 | | -|Phi| microsoft/Phi-3-mini-4k-instruct | 🟩 | 🟩 | |
85 | | -|Phi| microsoft/Phi-3-mini-128k-instruct | 🟩 | 🟩 | |
86 | | -|Phi| microsoft/Phi-3-medium-4k-instruct | 🟩 | 🟩 | |
87 | | -|Phi| microsoft/Phi-3-medium-128k-instruct | 🟩 | 🟩 | |
88 | | -|Whisper| openai/whisper-large-v2 | 🟩 | 🟩 | |
| 55 | +|LLAMA| meta-llama/Llama-2-7b-hf | ✅ | ✅ | |
| 56 | +|LLAMA| meta-llama/Llama-2-13b-hf | ✅ | ✅ | |
| 57 | +|LLAMA| meta-llama/Llama-2-70b-hf | ✅ | ✅ | |
| 58 | +|LLAMA| meta-llama/Meta-Llama-3-8B | ✅ | ✅ | |
| 59 | +|LLAMA| meta-llama/Meta-Llama-3-70B | ✅ | ✅ | |
| 60 | +|LLAMA| meta-llama/Meta-Llama-3.1-8B-Instruct | ✅ | ✅ | |
| 61 | +|LLAMA| meta-llama/Llama-3.2-3B-Instruct | ✅ | ✅ | |
| 62 | +|LLAMA| meta-llama/Llama-3.2-11B-Vision-Instruct | ✅ | ✅ | |
| 63 | +|GPT-J| EleutherAI/gpt-j-6b | ✅ | ✅ | |
| 64 | +|GPT-NEOX| EleutherAI/gpt-neox-20b | ✅ | ✅ | |
| 65 | +|DOLLY| databricks/dolly-v2-12b | ✅ | ✅ | |
| 66 | +|FALCON| tiiuae/falcon-11b | ✅ | ✅ | |
| 67 | +|FALCON| tiiuae/falcon-40b | ✅ | ✅ | |
| 68 | +|OPT| facebook/opt-30b | ✅ | ✅ | |
| 69 | +|OPT| facebook/opt-1.3b | ✅ | ✅ | |
| 70 | +|Bloom| bigscience/bloom-1b7 | ✅ | ✅ | |
| 71 | +|CodeGen| Salesforce/codegen-2B-multi | ✅ | ✅ | |
| 72 | +|Baichuan| baichuan-inc/Baichuan2-7B-Chat | ✅ | ✅ | |
| 73 | +|Baichuan| baichuan-inc/Baichuan2-13B-Chat | ✅ | ✅ | |
| 74 | +|Baichuan| baichuan-inc/Baichuan-13B-Chat | ✅ | ✅ | |
| 75 | +|GPTBigCode| bigcode/starcoder | ✅ | ✅ | |
| 76 | +|T5| google/flan-t5-xl | ✅ | ✅ | |
| 77 | +|Mistral| mistralai/Mistral-7B-v0.1 | ✅ | ✅ | |
| 78 | +|Mistral| mistralai/Mixtral-8x7B-v0.1 | ✅ | ✅ | |
| 79 | +|MPT| mosaicml/mpt-7b | ✅ | ✅ | |
| 80 | +|Stablelm| stabilityai/stablelm-2-1_6b | ✅ | ✅ | |
| 81 | +|Qwen| Qwen/Qwen-7B-Chat | ✅ | ✅ | |
| 82 | +|Qwen| Qwen/Qwen2-7B | ✅ | ✅ | |
| 83 | +|GIT| microsoft/git-base | ✅ | ✅ | |
| 84 | +|Phi| microsoft/phi-2 | ✅ | ✅ | |
| 85 | +|Phi| microsoft/Phi-3-mini-4k-instruct | ✅ | ✅ | |
| 86 | +|Phi| microsoft/Phi-3-mini-128k-instruct | ✅ | ✅ | |
| 87 | +|Phi| microsoft/Phi-3-medium-4k-instruct | ✅ | ✅ | |
| 88 | +|Phi| microsoft/Phi-3-medium-128k-instruct | ✅ | ✅ | |
| 89 | +|Whisper| openai/whisper-large-v2 | ✅ | ✅ | |
89 | 90 |
|
90 | 91 | *Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) |
91 | 92 | are well supported with all optimizations like indirect access KV cache, fused ROPE, and customized linear kernels. |
|
0 commit comments