Skip to content
hydropix edited this page Dec 23, 2025 · 15 revisions

Translation Quality Benchmark

Last updated: 2025-12-23 23:36

This wiki contains translation quality benchmarks for various LLM models across 19 languages.

Score Legend

Indicator Range Label
🟢 9-10 Excellent
🟡 7-8 Good
🟠 5-6 Acceptable
🔴 3-4 Poor
1-2 Failed

Model Rankings

Overall performance across all tested languages:

Rank Model Avg Score Accuracy Fluency Style Languages Tested
1 mistralai/mistral-medium-3.1 🟡 7.5 7.9 7.3 7.3 95
2 gemma3:27b-it-qat 🟡 7.1 7.6 7.0 6.8 95
3 gemma3:27b 🟡 7.1 7.7 7.1 6.8 95
4 ministral-3:14b 🟠 6.9 7.4 6.9 6.6 95
5 qwen3:30b 🟠 6.7 7.4 6.7 6.4 95
6 gemma3:12b 🟠 6.7 7.3 6.6 6.4 95
7 qwen3:30b-instruct 🟠 6.6 7.2 6.6 6.2 95
8 mistral-small:24b 🟠 6.4 7.2 6.4 6.2 95
9 ministral-3 🟠 6.3 7.0 6.2 5.9 95
10 qwen3:14b 🟠 6.0 6.7 6.0 5.7 95
11 qwen3:4b 🟠 5.9 6.7 5.8 5.5 95
12 gemma3:4b 🟠 5.7 6.5 5.8 5.3 95
13 qwen3:8b 🟠 5.5 6.3 5.4 5.2 95
14 llama3.1:8b 🔴 4.2 4.9 4.2 3.8 95
15 llama3.2 ⚫ 2.5 3.4 2.5 2.3 95

Language Rankings (Top 15)

Best translation quality by target language:

Rank Language Native Avg Score Best Model Tests
1 Spanish Español 🟡 7.2 mistralai/mistral-medium-3.1 75
2 French Français 🟡 7.1 mistralai/mistral-medium-3.1 75
3 Portuguese Português 🟡 7.1 qwen3:30b-instruct 75
4 Italian Italiano 🟠 6.9 gemma3:27b 75
5 Chinese (Traditional) 繁體中文 🟠 6.8 qwen3:30b 75
6 Chinese (Simplified) 简体中文 🟠 6.8 qwen3:30b-instruct 75
7 German Deutsch 🟠 6.7 ministral-3:14b 75
8 Russian Русский 🟠 6.4 mistralai/mistral-medium-3.1 75
9 Vietnamese Tiếng Việt 🟠 6.3 mistralai/mistral-medium-3.1 75
10 Polish Polski 🟠 6.1 mistralai/mistral-medium-3.1 75
11 Ukrainian Українська 🟠 5.9 mistralai/mistral-medium-3.1 75
12 Thai ไทย 🟠 5.9 mistralai/mistral-medium-3.1 75
13 Arabic العربية 🟠 5.8 gemma3:27b-it-qat 75
14 Japanese 日本語 🟠 5.6 gemma3:27b 75
15 Hindi हिन्दी 🟠 5.5 gemma3:27b 75

View all 19 languages...


Quick Stats

  • Total Models Tested: 15
  • Total Languages: 19
  • Total Translations: 1425
  • Evaluator Model: anthropic/claude-haiku-4.5
  • Source Language: English

Categories

By Language Category

European Major Languages

Language Avg Score Best Model
Spanish 🟡 7.2 mistralai/mistral-medium-3.1
French 🟡 7.1 mistralai/mistral-medium-3.1
Portuguese 🟡 7.1 qwen3:30b-instruct
Italian 🟠 6.9 gemma3:27b
German 🟠 6.7 ministral-3:14b
Polish 🟠 6.1 mistralai/mistral-medium-3.1

Asian Languages

Language Avg Score Best Model
Chinese (Traditional) 🟠 6.8 qwen3:30b
Chinese (Simplified) 🟠 6.8 qwen3:30b-instruct
Vietnamese 🟠 6.3 mistralai/mistral-medium-3.1
Thai 🟠 5.9 mistralai/mistral-medium-3.1
Japanese 🟠 5.6 gemma3:27b
Hindi 🟠 5.5 gemma3:27b
Korean 🟠 5.5 mistralai/mistral-medium-3.1
Tamil 🔴 4.7 mistralai/mistral-medium-3.1
Bengali 🔴 4.7 mistralai/mistral-medium-3.1

Cyrillic Languages

Language Avg Score Best Model
Russian 🟠 6.4 mistralai/mistral-medium-3.1
Ukrainian 🟠 5.9 mistralai/mistral-medium-3.1

Semitic Languages

Language Avg Score Best Model
Arabic 🟠 5.8 gemma3:27b-it-qat
Hebrew 🔴 4.5 gemma3:27b-it-qat

Browse


Generated by TranslateBookWithLLM benchmark system

Clone this wiki locally