add Kimi K2 model documentation and benchmarks #64

ilopezluna · 2025-12-04T14:57:44Z

No description provided.

gemini-code-assist · 2025-12-04T14:58:00Z

Summary of Changes

Hello @ilopezluna, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive documentation and performance benchmarks for the Kimi K2 Thinking model. It provides detailed insights into the model's architecture, key features, and its performance across a wide array of reasoning, general, agentic search, and coding tasks, facilitating a better understanding of its capabilities and usage. Separate documentation is provided for both the vLLM and GGUF versions of the model.

Highlights

New Model Documentation: Comprehensive documentation for the Kimi K2 Thinking model has been added, detailing its architecture, key features, and usage.
Detailed Benchmarks: Extensive benchmark results are included, showcasing the Kimi K2 Thinking model's performance across various reasoning, general, agentic search, and coding tasks, comparing it against other leading models.
Model Specifications: Detailed specifications of the Kimi K2 model, such as its Mixture-of-Experts (MoE) architecture, parameter count, context length, and activation function, are now documented.
Docker Integration: Instructions for running the Kimi K2 model using the Docker Model Runner have been provided for both vLLM and GGUF versions.
GGUF Version Information: Specific documentation for the Kimi K2 GGUF version by Unsloth has been added, including relevant Hugging Face and Unsloth documentation links.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The two new docs are almost identical; consider extracting the shared description/benchmark sections into a single canonical Kimi K2 page and keeping only the runner-specific bits (GGUF vs vLLM) separate to avoid duplication and future drift.
It may be helpful to make the distinction between the GGUF and vLLM variants more obvious in the titles and introductory text (e.g., explicitly label one as 'GGUF (Unsloth)' and the other as 'vLLM server' near the top) so users immediately see which page applies to their setup.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The two new docs are almost identical; consider extracting the shared description/benchmark sections into a single canonical Kimi K2 page and keeping only the runner-specific bits (GGUF vs vLLM) separate to avoid duplication and future drift.
- It may be helpful to make the distinction between the GGUF and vLLM variants more obvious in the titles and introductory text (e.g., explicitly label one as 'GGUF (Unsloth)' and the other as 'vLLM server' near the top) so users immediately see which page applies to their setup.

## Individual Comments

### Comment 1
<location> `ai/kimi-k2.md:6-7` </location>
<code_context>
+![logo](https://statics.moonshot.cn/kimi-blog/assets/logo-CvjirWOb.svg)
+
+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.
+
+
</code_context>

<issue_to_address>
**issue (typo):** Consider fixing the article usage and the 'k/K' inconsistency in this description sentence.

For example: "Kimi K2 Thinking is the latest, most capable version of an open-source thinking model ... with a 256K context window," which adds the missing article before "open-source thinking model" and matches the "256K" capitalization used later in the table.

```suggestion
## Description
Kimi K2 Thinking is the latest, most capable version of an open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with a 256K context window, achieving lossless reductions in inference latency and GPU memory usage.
```
</issue_to_address>

### Comment 2
<location> `ai/kimi-k2.md:12` </location>
<code_context>
+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
+- **Stable Long-Horizon Agency:** Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.
+
</code_context>

<issue_to_address>
**nitpick (typo):** Add a definite article in 'in post-training stage' for smoother grammar.

For example: "is employed in the post-training stage" or "during the post-training stage".

```suggestion
- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in the post-training stage to achieve lossless 2x speed-up in low-latency mode.
```
</issue_to_address>

### Comment 3
<location> `ai/kimi-k2-vllm.md:6` </location>
<code_context>
+![logo](https://statics.moonshot.cn/kimi-blog/assets/logo-CvjirWOb.svg)
+
+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.
+
+
</code_context>

<issue_to_address>
**issue (typo):** Same as the other file: fix the missing article and '256k' capitalization in this sentence.

Use "version of an open-source thinking model" and "a 256K context window" to align grammar and capitalization with the table.

```suggestion
Kimi K2 Thinking is the latest, most capable version of an open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with a 256K context window, achieving lossless reductions in inference latency and GPU memory usage.
```
</issue_to_address>

### Comment 4
<location> `ai/kimi-k2-vllm.md:11` </location>
<code_context>
+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
+- **Stable Long-Horizon Agency:** Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.
+
</code_context>

<issue_to_address>
**nitpick (typo):** As in the other file, 'in post-training stage' reads more naturally with 'the'.

```suggestion
- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in the post-training stage to achieve lossless 2x speed-up in low-latency mode.
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-12-04T14:58:58Z

ai/kimi-k2.md

+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.


issue (typo): Consider fixing the article usage and the 'k/K' inconsistency in this description sentence.

For example: "Kimi K2 Thinking is the latest, most capable version of an open-source thinking model ... with a 256K context window," which adds the missing article before "open-source thinking model" and matches the "256K" capitalization used later in the table.

Suggested change

## Description

Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

## Description

Kimi K2 Thinking is the latest, most capable version of an open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with a 256K context window, achieving lossless reductions in inference latency and GPU memory usage.

sourcery-ai · 2025-12-04T14:58:58Z

ai/kimi-k2.md

+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.


nitpick (typo): Add a definite article in 'in post-training stage' for smoother grammar.

For example: "is employed in the post-training stage" or "during the post-training stage".

Suggested change

- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.

- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in the post-training stage to achieve lossless 2x speed-up in low-latency mode.

sourcery-ai · 2025-12-04T14:58:58Z

ai/kimi-k2-vllm.md

+![logo](https://statics.moonshot.cn/kimi-blog/assets/logo-CvjirWOb.svg)
+
+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.


issue (typo): Same as the other file: fix the missing article and '256k' capitalization in this sentence.

Use "version of an open-source thinking model" and "a 256K context window" to align grammar and capitalization with the table.

Suggested change

Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

Kimi K2 Thinking is the latest, most capable version of an open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with a 256K context window, achieving lossless reductions in inference latency and GPU memory usage.

sourcery-ai · 2025-12-04T14:58:59Z

ai/kimi-k2-vllm.md

+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.


nitpick (typo): As in the other file, 'in post-training stage' reads more naturally with 'the'.

Suggested change

- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.

- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in the post-training stage to achieve lossless 2x speed-up in low-latency mode.

gemini-code-assist

Code Review

This pull request adds documentation for the Kimi K2 model. While the information is valuable, there are a few major issues. The two new files, ai/kimi-k2-vllm.md and ai/kimi-k2.md, have almost identical content, which will create a maintenance burden. I recommend consolidating them into a single file with sections for each variant (vllm and GGUF) if needed, perhaps using the Available model variants section from the template. More importantly, neither file follows the repository's template.md for model documentation. I've left specific comments on how to align with the template. Additionally, the benchmark tables contain unexplained asterisks, which makes the data difficult to interpret. Please address these points to ensure consistency and clarity.

gemini-code-assist · 2025-12-04T14:59:11Z

ai/kimi-k2-vllm.md

+# Kimi K2
+
+![logo](https://statics.moonshot.cn/kimi-blog/assets/logo-CvjirWOb.svg)
+
+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.
+
+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
+- **Stable Long-Horizon Agency:** Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.
+
+| **Field**                               | **Value**                |
+|-----------------------------------------|--------------------------|
+| Architecture                            | Mixture-of-Experts (MoE) |
+| Total Parameters                        | 1T                       |
+| Activated Parameters                    | 32B                      |
+| Number of Layers (Dense layer included) | 61                       |
+| Number of Dense Layers                  | 1                        |
+| Attention Hidden Dimension              | 7168                     |
+| MoE Hidden Dimension (per Expert)       | 2048                     |
+| Number of Attention Heads               | 64                       |
+| Number of Experts                       | 384                      |
+| Selected Experts per Token              | 8                        |
+| Number of Shared Experts                | 1                        |
+| Vocabulary Size                         | 160K                     |
+| Context Length                          | 256K                     |
+| Attention Mechanism                     | MLA                      |
+| Activation Function                     | SwiGLU                   |
+
+
+## Use this AI model with Docker Model Runner
+
+```bash
+docker model run kimi-k2-vllm
+```
+
+## Benchmarks
+
+### Reasoning Tasks
+| Benchmark       | Setting   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 | Grok-4 |
+|-----------------|-----------|-------------|--------------|-------------------|--------------------|---------------|--------|
+| HLE             | no tools  | 23.9        | 26.3         | 19.8*             | 7.9                | 19.8          | 25.4   |
+| HLE             | w/ tools  | 44.9        | 41.7*        | 32.0*             | 21.7               | 20.3*         | 41.0   |
+| HLE             | heavy     | 51.0        | 42.0         | -                 | -                  | -             | 50.7   |
+| AIME25          | no tools  | 94.5        | 94.6         | 87.0              | 51.0               | 89.3          | 91.7   |
+| AIME25          | w/ python | 99.1        | 99.6         | 100.0             | 75.2               | 58.1*         | 98.8   |
+| AIME25          | heavy     | 100.0       | 100.0        | -                 | -                  | -             | 100.0  |
+| HMMT25          | no tools  | 89.4        | 93.3         | 74.6*             | 38.8               | 83.6          | 90.0   |
+| HMMT25          | w/ python | 95.1        | 96.7         | 88.8*             | 70.4               | 49.5*         | 93.9   |
+| HMMT25          | heavy     | 97.5        | 100.0        | -                 | -                  | -             | 96.7   |
+| IMO-AnswerBench | no tools  | 78.6        | 76.0*        | 65.9*             | 45.8               | 76.0*         | 73.1   |
+| GPQA            | no tools  | 84.5        | 85.7         | 83.4              | 74.2               | 79.9          | 87.5   |
+
+
+### General Tasks
+
+| Benchmark        | Setting  | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------|----------|-------------|--------------|-------------------|--------------------|---------------|
+| MMLU-Pro         | no tools | 84.6        | 87.1         | 87.5              | 81.9               | 85.0          |
+| MMLU-Redux       | no tools | 94.4        | 95.3         | 95.6              | 92.7               | 93.7          |
+| Longform Writing | no tools | 73.8        | 71.4         | 79.8              | 62.8               | 72.5          |
+| HealthBench      | no tools | 58.0        | 67.2         | 44.2              | 43.8               | 46.9          |
+
+
+### Agentic Search Tasks
+
+| Benchmark        | Setting  | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------|----------|-------------|--------------|-------------------|--------------------|---------------|
+| BrowseComp       | w/ tools | 60.2        | 54.9         | 24.1              | 7.4                | 40.1          |
+| BrowseComp-ZH    | w/ tools | 62.3        | 63.0*        | 42.4*             | 22.2               | 47.9          |
+| Seal-0           | w/ tools | 56.3        | 51.4*        | 53.4*             | 25.2               | 38.5*         |
+| FinSearchComp-T3 | w/ tools | 47.4        | 48.5*        | 44.0*             | 10.4               | 27.0*         |
+| Frames           | w/ tools | 87.0        | 86.0*        | 85.0*             | 58.1               | 80.2*         |
+
+
+### Coding Tasks
+
+| Benchmark              | Setting                   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------------|---------------------------|-------------|--------------|-------------------|--------------------|---------------|
+| SWE-bench Verified     | w/ tools                  | 71.3        | 74.9         | 77.2              | 69.2               | 67.8          |
+| SWE-bench Multilingual | w/ tools                  | 61.1        | 55.3*        | 68.0              | 55.9               | 57.9          |
+| Multi-SWE-bench        | w/ tools                  | 41.9        | 39.3*        | 44.3              | 33.5               | 30.6          |
+| SciCode                | no tools                  | 44.8        | 42.9         | 44.7              | 30.7               | 37.7          |
+| LiveCodeBenchV6        | no tools                  | 83.1        | 87.0*        | 64.0*             | 56.1*              | 74.1          |
+| OJ-Bench (cpp)         | no tools                  | 48.7        | 56.2*        | 30.4*             | 25.5*              | 38.2*         |
+| Terminal-Bench         | w/ simulated tools (JSON) | 47.1        | 43.8         | 51.0              | 44.5               | 37.7          |
+
+## Links
+- https://moonshotai.github.io/Kimi-K2/thinking.html
+- https://huggingface.co/moonshotai/Kimi-K2-Thinking


The structure of this document does not follow the established template.md for model pages. For consistency across the project, please restructure this file to match the template. Key missing or mismatched sections include:

Characteristics: The provided table is different. Please use the format from the template and include fields like Provider, Cutoff date, Languages, Tool calling, License, etc.

Available model variants: This section is missing.

Considerations: This section is missing.

Benchmark performance: The format of benchmark reporting is different from the template.

Adhering to the template is important for maintainability and user experience.

gemini-code-assist · 2025-12-04T14:59:12Z

ai/kimi-k2.md

+# Kimi K2
+*GGUF version by Unsloth*
+
+![logo](https://statics.moonshot.cn/kimi-blog/assets/logo-CvjirWOb.svg)
+
+## Description
+Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.
+
+
+## Key Features
+- **Deep Thinking & Tool Orchestration:** End-to-end trained to interleave chain-of-thought reasoning with function calls, enabling autonomous research, coding, and writing workflows that last hundreds of steps without drift.
+- **Native INT4 Quantization:** Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
+- **Stable Long-Horizon Agency:** Maintains coherent goal-directed behavior across up to 200–300 consecutive tool invocations, surpassing prior models that degrade after 30–50 steps.
+
+| **Field**                               | **Value**                |
+|-----------------------------------------|--------------------------|
+| Architecture                            | Mixture-of-Experts (MoE) |
+| Total Parameters                        | 1T                       |
+| Activated Parameters                    | 32B                      |
+| Number of Layers (Dense layer included) | 61                       |
+| Number of Dense Layers                  | 1                        |
+| Attention Hidden Dimension              | 7168                     |
+| MoE Hidden Dimension (per Expert)       | 2048                     |
+| Number of Attention Heads               | 64                       |
+| Number of Experts                       | 384                      |
+| Selected Experts per Token              | 8                        |
+| Number of Shared Experts                | 1                        |
+| Vocabulary Size                         | 160K                     |
+| Context Length                          | 256K                     |
+| Attention Mechanism                     | MLA                      |
+| Activation Function                     | SwiGLU                   |
+
+
+## Use this AI model with Docker Model Runner
+
+```bash
+docker model run kimi-k2
+```
+
+## Benchmarks
+
+### Reasoning Tasks
+| Benchmark       | Setting   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 | Grok-4 |
+|-----------------|-----------|-------------|--------------|-------------------|--------------------|---------------|--------|
+| HLE             | no tools  | 23.9        | 26.3         | 19.8*             | 7.9                | 19.8          | 25.4   |
+| HLE             | w/ tools  | 44.9        | 41.7*        | 32.0*             | 21.7               | 20.3*         | 41.0   |
+| HLE             | heavy     | 51.0        | 42.0         | -                 | -                  | -             | 50.7   |
+| AIME25          | no tools  | 94.5        | 94.6         | 87.0              | 51.0               | 89.3          | 91.7   |
+| AIME25          | w/ python | 99.1        | 99.6         | 100.0             | 75.2               | 58.1*         | 98.8   |
+| AIME25          | heavy     | 100.0       | 100.0        | -                 | -                  | -             | 100.0  |
+| HMMT25          | no tools  | 89.4        | 93.3         | 74.6*             | 38.8               | 83.6          | 90.0   |
+| HMMT25          | w/ python | 95.1        | 96.7         | 88.8*             | 70.4               | 49.5*         | 93.9   |
+| HMMT25          | heavy     | 97.5        | 100.0        | -                 | -                  | -             | 96.7   |
+| IMO-AnswerBench | no tools  | 78.6        | 76.0*        | 65.9*             | 45.8               | 76.0*         | 73.1   |
+| GPQA            | no tools  | 84.5        | 85.7         | 83.4              | 74.2               | 79.9          | 87.5   |
+
+
+### General Tasks
+
+| Benchmark        | Setting  | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------|----------|-------------|--------------|-------------------|--------------------|---------------|
+| MMLU-Pro         | no tools | 84.6        | 87.1         | 87.5              | 81.9               | 85.0          |
+| MMLU-Redux       | no tools | 94.4        | 95.3         | 95.6              | 92.7               | 93.7          |
+| Longform Writing | no tools | 73.8        | 71.4         | 79.8              | 62.8               | 72.5          |
+| HealthBench      | no tools | 58.0        | 67.2         | 44.2              | 43.8               | 46.9          |
+
+
+### Agentic Search Tasks
+
+| Benchmark        | Setting  | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------|----------|-------------|--------------|-------------------|--------------------|---------------|
+| BrowseComp       | w/ tools | 60.2        | 54.9         | 24.1              | 7.4                | 40.1          |
+| BrowseComp-ZH    | w/ tools | 62.3        | 63.0*        | 42.4*             | 22.2               | 47.9          |
+| Seal-0           | w/ tools | 56.3        | 51.4*        | 53.4*             | 25.2               | 38.5*         |
+| FinSearchComp-T3 | w/ tools | 47.4        | 48.5*        | 44.0*             | 10.4               | 27.0*         |
+| Frames           | w/ tools | 87.0        | 86.0*        | 85.0*             | 58.1               | 80.2*         |
+
+
+### Coding Tasks
+
+| Benchmark              | Setting                   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 |
+|------------------------|---------------------------|-------------|--------------|-------------------|--------------------|---------------|
+| SWE-bench Verified     | w/ tools                  | 71.3        | 74.9         | 77.2              | 69.2               | 67.8          |
+| SWE-bench Multilingual | w/ tools                  | 61.1        | 55.3*        | 68.0              | 55.9               | 57.9          |
+| Multi-SWE-bench        | w/ tools                  | 41.9        | 39.3*        | 44.3              | 33.5               | 30.6          |
+| SciCode                | no tools                  | 44.8        | 42.9         | 44.7              | 30.7               | 37.7          |
+| LiveCodeBenchV6        | no tools                  | 83.1        | 87.0*        | 64.0*             | 56.1*              | 74.1          |
+| OJ-Bench (cpp)         | no tools                  | 48.7        | 56.2*        | 30.4*             | 25.5*              | 38.2*         |
+| Terminal-Bench         | w/ simulated tools (JSON) | 47.1        | 43.8         | 51.0              | 44.5               | 37.7          |
+
+## Links
+- https://moonshotai.github.io/Kimi-K2/thinking.html
+- https://huggingface.co/moonshotai/Kimi-K2-Thinking
+- [Hugging Face (Unsloth GGUF)](https://huggingface.co/unsloth/Kimi-K2-Thinking-GGUF)
+- [Unsloth Dynamic 2.0 GGUF](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)


The structure of this document does not follow the established template.md for model pages. For consistency across the project, please restructure this file to match the template. Key missing or mismatched sections include:

Characteristics: The provided table is different. Please use the format from the template and include fields like Provider, Cutoff date, Languages, Tool calling, License, etc.

Available model variants: This section is missing. The subtitle on line 2 (*GGUF version by Unsloth*) should likely be part of this section.

Considerations: This section is missing.

Benchmark performance: The format of benchmark reporting is different from the template.

Adhering to the template is important for maintainability and user experience.

gemini-code-assist · 2025-12-04T14:59:12Z

ai/kimi-k2-vllm.md

+## Benchmarks
+
+### Reasoning Tasks
+| Benchmark       | Setting   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 | Grok-4 |


The benchmark tables use asterisks (*) next to some values (e.g., 19.8* on line 44), but there's no explanation for what the asterisk signifies. Please add a footnote or a note to clarify its meaning. Without this, the benchmark data is ambiguous.

gemini-code-assist · 2025-12-04T14:59:12Z

ai/kimi-k2.md

+## Benchmarks
+
+### Reasoning Tasks
+| Benchmark       | Setting   | K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 | K2 0905 (Thinking) | DeepSeek-V3.2 | Grok-4 |


The benchmark tables use asterisks (*) next to some values (e.g., 19.8* on line 45), but there's no explanation for what the asterisk signifies. Please add a footnote or a note to clarify its meaning. Without this, the benchmark data is ambiguous.

add Kimi K2 model documentation and benchmarks

2f90f94

sourcery-ai bot reviewed Dec 4, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Kimi K2 model documentation and benchmarks #64

add Kimi K2 model documentation and benchmarks #64

Uh oh!

ilopezluna commented Dec 4, 2025

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

sourcery-ai bot Dec 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		## Description
		Kimi K2 Thinking is the latest, most capable version of open-source thinking model. Starting with Kimi K2, we built it as a thinking agent that reasons step-by-step while dynamically invoking tools. It sets a new state-of-the-art on Humanity's Last Exam (HLE), BrowseComp, and other benchmarks by dramatically scaling multi-step reasoning depth and maintaining stable tool-use across 200–300 sequential calls. At the same time, K2 Thinking is a native INT4 quantization model with 256k context window, achieving lossless reductions in inference latency and GPU memory usage.

	- Native INT4 Quantization: Quantization-Aware Training (QAT) is employed in post-training stage to achieve lossless 2x speed-up in low-latency mode.
	- Native INT4 Quantization: Quantization-Aware Training (QAT) is employed in the post-training stage to achieve lossless 2x speed-up in low-latency mode.

add Kimi K2 model documentation and benchmarks #64

Are you sure you want to change the base?

add Kimi K2 model documentation and benchmarks #64

Uh oh!

Conversation

ilopezluna commented Dec 4, 2025

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants