ncnn_llm

ncnn_llm provides Large Language Model (LLM) support for the ncnn framework.

ncnn is a high-performance neural network inference framework specifically optimized for mobile and embedded devices. By integrating LLMs into ncnn, this project enables the execution of complex natural language processing tasks in resource-constrained environments (edge devices, mobile phones, IoT).

🚀 Project Origin

This project originated from nihui's implementation of the kvcache feature for ncnn, which opened the door for running LLMs on the framework. Motivated by the spirit of open-source contribution, this repository organizes and expands upon that functionality into an independent project.

The goal is to provide a complete pipeline, making it easier for developers to use LLMs on ncnn and contribute to the ecosystem.

⚠️ Important Note: ncnn's support for kvcache is currently in an experimental stage. You must compile ncnn from the master branch to ensure you have the latest features required for this project to run.

📊 Model Support Matrix

The project is currently in active development. Below is the current compatibility status of various models.

✅ Perfectly Supported

These models run smoothly with the implemented tokenizer and inference pipeline.

MiniCPM4-0.5B
Qwen3 (0.6B)
Qwen2.5-VL
NLLB (No Language Left Behind)

⚠️ Running with Issues

These models can be loaded and run, but may experience bugs or suboptimal performance.

Hunyuan 0.5B

🚧 Theoretical Support (Work in Progress)

These models should theoretically work but are currently failing or unverified in the current build.

Qwen3-VL-2B-Instruct
TinyLlama-1.1B-Chat-v1.0
Qwen2.5-0.5B
Llama-3.2-1B-Instruct
DeepSeek-R1-Distill-Qwen-1.5b

🔜 Coming Soon

Hunyuan OCR
PaddleOCR-VL

🛠️ Build and Usage

This project uses xmake for building.

1. Clone the Repository

git clone https://github.com/futz12/ncnn_llm.git
cd ncnn_llm

2. Build

xmake build

3. Run (Example: MiniCPM4)

Ensure you have downloaded the model weights (see below) before running.

xmake run minicpm4_main

Example Output

 * Executing task: xmake run minicpm4_main 

Chat with MiniCPM4-0.5B! Type 'exit' or 'quit' to end the conversation.
User: Hello
Assistant: 
Hello, I am your intelligent assistant. I can help you check the weather, news, music, translation, etc. Is there anything you need help with?
User: Do you know what OpenCV is?
Assistant: OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine learning software library. It contains many algorithms and tools for image and video processing...

🚀 llm_ncnn_run (CLI / OpenAI API + Auto Download + MCP)

llm_ncnn_run is a unified example that supports:

Two modes: CLI chat (--mode cli) and OpenAI-style HTTP server (--mode openai)
Built-in tools (random/add) + external MCP tools
Automatic model download from https://mirrors.sdu.edu.cn/ncnn_modelzoo/ (by parsing model.json)

Build

xmake build llm_ncnn_run

Run (CLI mode)

xmake run llm_ncnn_run --mode cli --model qwen2.5_vl_3b

Notes:

If --model is a bare name (no path separators), it downloads to ./assets/<name>.
You can also pass an explicit path: --model ./assets/qwen3_0.6b.

Run (OpenAI API mode)

xmake run llm_ncnn_run --mode openai --port 8080 --model qwen3_0.6b

Endpoints:

http://localhost:8080/ (web chat)
http://localhost:8080/v1/chat/completions (OpenAI-style API)

MCP (stdio tools)

CLI mode:

xmake run llm_ncnn_run --mode cli --mcp-server "./my_mcp_server --flag"

OpenAI mode:

xmake run llm_ncnn_run --mode openai --port 8080 --mcp-server "./my_mcp_server --flag"

Common MCP flags:

--mcp-transport lsp|jsonl
--mcp-debug
--mcp-timeout-ms <n>

HTTPS Certificate Issues

If download fails due to TLS certificate errors, set CA path:

NCNN_LLM_CA_FILE=/etc/ssl/certs/ca-certificates.crt \
xmake run llm_ncnn_run --mode openai --model qwen2.5_vl_3b

📥 Model Zoo

You can download the converted ncnn-compatible model weights from the following mirror:

🔗 ncnn Model Zoo Mirror

🔮 Roadmap

We are committed to improving ncnn_llm. Our future plans include:

Upstream Optimization: Submitting optimization patches directly to the upstream ncnn repository to improve core LLM support.
Expanded Support: Adding support for more model architectures and tokenizers.
Performance: Optimizing inference speed and reducing memory footprint.
INT8 Quantization: Implementing INT8 quantization support.
Documentation: Improving the export pipeline docs and adding more C++ usage examples.

Note: While we provide a complete export pipeline, older pipelines may become obsolete as the library evolves. Please refer to the latest example code for adjustments.

🤝 Community & Contact

We welcome everyone to pay attention to and participate in this project to jointly promote the development of ncnn in the field of Large Language Models!

QQ Group: 767178345

📝 License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.github/workflows		.github/workflows
assets		assets
benchmark		benchmark
examples		examples
export		export
img		img
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README_CN.md		README_CN.md
readme.md		readme.md
xmake.lua		xmake.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ncnn_llm

🚀 Project Origin

📊 Model Support Matrix

✅ Perfectly Supported

⚠️ Running with Issues

🚧 Theoretical Support (Work in Progress)

🔜 Coming Soon

🛠️ Build and Usage

1. Clone the Repository

2. Build

3. Run (Example: MiniCPM4)

Example Output

🚀 llm_ncnn_run (CLI / OpenAI API + Auto Download + MCP)

Build

Run (CLI mode)

Run (OpenAI API mode)

MCP (stdio tools)

HTTPS Certificate Issues

📥 Model Zoo

🔮 Roadmap

🤝 Community & Contact

📝 License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

futz12/ncnn_llm

Folders and files

Latest commit

History

Repository files navigation

ncnn_llm

🚀 Project Origin

📊 Model Support Matrix

✅ Perfectly Supported

⚠️ Running with Issues

🚧 Theoretical Support (Work in Progress)

🔜 Coming Soon

🛠️ Build and Usage

1. Clone the Repository

2. Build

3. Run (Example: MiniCPM4)

Example Output

🚀 llm_ncnn_run (CLI / OpenAI API + Auto Download + MCP)

Build

Run (CLI mode)

Run (OpenAI API mode)

MCP (stdio tools)

HTTPS Certificate Issues

📥 Model Zoo

🔮 Roadmap

🤝 Community & Contact

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages