-
Notifications
You must be signed in to change notification settings - Fork 54
Home
hydropix edited this page Dec 23, 2025
·
15 revisions
Last updated: 2025-12-23 23:36
This wiki contains translation quality benchmarks for various LLM models across 19 languages.
| Indicator | Range | Label |
|---|---|---|
| 🟢 | 9-10 | Excellent |
| 🟡 | 7-8 | Good |
| 🟠 | 5-6 | Acceptable |
| 🔴 | 3-4 | Poor |
| ⚫ | 1-2 | Failed |
Overall performance across all tested languages:
| Rank | Model | Avg Score | Accuracy | Fluency | Style | Languages Tested |
|---|---|---|---|---|---|---|
| 1 | mistralai/mistral-medium-3.1 | 🟡 7.5 | 7.9 | 7.3 | 7.3 | 95 |
| 2 | gemma3:27b-it-qat | 🟡 7.1 | 7.6 | 7.0 | 6.8 | 95 |
| 3 | gemma3:27b | 🟡 7.1 | 7.7 | 7.1 | 6.8 | 95 |
| 4 | ministral-3:14b | 🟠 6.9 | 7.4 | 6.9 | 6.6 | 95 |
| 5 | qwen3:30b | 🟠 6.7 | 7.4 | 6.7 | 6.4 | 95 |
| 6 | gemma3:12b | 🟠 6.7 | 7.3 | 6.6 | 6.4 | 95 |
| 7 | qwen3:30b-instruct | 🟠 6.6 | 7.2 | 6.6 | 6.2 | 95 |
| 8 | mistral-small:24b | 🟠 6.4 | 7.2 | 6.4 | 6.2 | 95 |
| 9 | ministral-3 | 🟠 6.3 | 7.0 | 6.2 | 5.9 | 95 |
| 10 | qwen3:14b | 🟠 6.0 | 6.7 | 6.0 | 5.7 | 95 |
| 11 | qwen3:4b | 🟠 5.9 | 6.7 | 5.8 | 5.5 | 95 |
| 12 | gemma3:4b | 🟠 5.7 | 6.5 | 5.8 | 5.3 | 95 |
| 13 | qwen3:8b | 🟠 5.5 | 6.3 | 5.4 | 5.2 | 95 |
| 14 | llama3.1:8b | 🔴 4.2 | 4.9 | 4.2 | 3.8 | 95 |
| 15 | llama3.2 | ⚫ 2.5 | 3.4 | 2.5 | 2.3 | 95 |
Best translation quality by target language:
| Rank | Language | Native | Avg Score | Best Model | Tests |
|---|---|---|---|---|---|
| 1 | Spanish | Español | 🟡 7.2 | mistralai/mistral-medium-3.1 | 75 |
| 2 | French | Français | 🟡 7.1 | mistralai/mistral-medium-3.1 | 75 |
| 3 | Portuguese | Português | 🟡 7.1 | qwen3:30b-instruct | 75 |
| 4 | Italian | Italiano | 🟠 6.9 | gemma3:27b | 75 |
| 5 | Chinese (Traditional) | 繁體中文 | 🟠 6.8 | qwen3:30b | 75 |
| 6 | Chinese (Simplified) | 简体中文 | 🟠 6.8 | qwen3:30b-instruct | 75 |
| 7 | German | Deutsch | 🟠 6.7 | ministral-3:14b | 75 |
| 8 | Russian | Русский | 🟠 6.4 | mistralai/mistral-medium-3.1 | 75 |
| 9 | Vietnamese | Tiếng Việt | 🟠 6.3 | mistralai/mistral-medium-3.1 | 75 |
| 10 | Polish | Polski | 🟠 6.1 | mistralai/mistral-medium-3.1 | 75 |
| 11 | Ukrainian | Українська | 🟠 5.9 | mistralai/mistral-medium-3.1 | 75 |
| 12 | Thai | ไทย | 🟠 5.9 | mistralai/mistral-medium-3.1 | 75 |
| 13 | Arabic | العربية | 🟠 5.8 | gemma3:27b-it-qat | 75 |
| 14 | Japanese | 日本語 | 🟠 5.6 | gemma3:27b | 75 |
| 15 | Hindi | हिन्दी | 🟠 5.5 | gemma3:27b | 75 |
- Total Models Tested: 15
- Total Languages: 19
- Total Translations: 1425
- Evaluator Model: anthropic/claude-haiku-4.5
- Source Language: English
| Language | Avg Score | Best Model |
|---|---|---|
| Spanish | 🟡 7.2 | mistralai/mistral-medium-3.1 |
| French | 🟡 7.1 | mistralai/mistral-medium-3.1 |
| Portuguese | 🟡 7.1 | qwen3:30b-instruct |
| Italian | 🟠 6.9 | gemma3:27b |
| German | 🟠 6.7 | ministral-3:14b |
| Polish | 🟠 6.1 | mistralai/mistral-medium-3.1 |
| Language | Avg Score | Best Model |
|---|---|---|
| Chinese (Traditional) | 🟠 6.8 | qwen3:30b |
| Chinese (Simplified) | 🟠 6.8 | qwen3:30b-instruct |
| Vietnamese | 🟠 6.3 | mistralai/mistral-medium-3.1 |
| Thai | 🟠 5.9 | mistralai/mistral-medium-3.1 |
| Japanese | 🟠 5.6 | gemma3:27b |
| Hindi | 🟠 5.5 | gemma3:27b |
| Korean | 🟠 5.5 | mistralai/mistral-medium-3.1 |
| Tamil | 🔴 4.7 | mistralai/mistral-medium-3.1 |
| Bengali | 🔴 4.7 | mistralai/mistral-medium-3.1 |
| Language | Avg Score | Best Model |
|---|---|---|
| Russian | 🟠 6.4 | mistralai/mistral-medium-3.1 |
| Ukrainian | 🟠 5.9 | mistralai/mistral-medium-3.1 |
| Language | Avg Score | Best Model |
|---|---|---|
| Arabic | 🟠 5.8 | gemma3:27b-it-qat |
| Hebrew | 🔴 4.5 | gemma3:27b-it-qat |
- By Language: All Languages
- By Model: All Models
- Benchmark Documentation: How to Run Benchmarks
- Raw Data: Download JSON
Generated by TranslateBookWithLLM benchmark system