Skip to content

[WIP][TSS][Model] Kimi-Audio-7B#2941

Draft
zhangj1an wants to merge 28 commits intovllm-project:mainfrom
zhangj1an:jian/kimiaudio
Draft

[WIP][TSS][Model] Kimi-Audio-7B#2941
zhangj1an wants to merge 28 commits intovllm-project:mainfrom
zhangj1an:jian/kimiaudio

Conversation

@zhangj1an
Copy link
Copy Markdown
Contributor

@zhangj1an zhangj1an commented Apr 20, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Latest Status [27 Apr]

still running benchmark.

Purpose

Closes #1824.

Kimi-Audio consists of three main components:
kimia_framework

  1. Audio Tokenizer: Converts input audio into:
    • Discrete semantic tokens (12.5Hz) using vector quantization.
    • Continuous acoustic features derived from a Whisper encoder (downsampled to 12.5Hz).
  2. Audio LLM: A transformer-based model (initialized from a pre-trained text LLM like Qwen 2.5 7B) with shared layers processing multimodal inputs, followed by parallel heads for autoregressively generating text tokens and discrete audio semantic tokens.
  3. Audio Detokenizer: Converts the predicted discrete semantic audio tokens back into high-fidelity waveforms using a flow-matching model and a vocoder (BigVGAN), supporting chunk-wise streaming with a look-ahead mechanism for low latency.

Supports 3 tasks:

  • ASR (audio → text)
  • audio-to-audio chat (audio in → audio + text out)
  • multi-turn audio conversation

Key Adaptations

  • For transformer: re-uses qwen2 decoder.
  • For vocoder: still keeps its original kimi bigvan implemenation. using qwen2 bigvan vocoder degrades audio generation quality.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Ready for full review when draft status removed. Preliminary scan available on request.

Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Signed-off-by: Zhang Jian <jianmusings@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[New Model]: Kimi-Audio-7B

2 participants