🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy

📃 Paper • ✈️ Project Page • 🤗 Model • 🤗 Data

Efficient reasoning at scale by pruning redundant reasoning traces—without sacrificing accuracy.

📌 Overview

Large language models (LLMs) often generate multiple reasoning traces in parallel to improve answer reliability. However, these traces frequently exhibit severe inter-trace redundancy, leading to wasted computation and inflated inference costs.

DeepPrune addresses this by learning to identify and prune semantically redundant traces before full execution—enabling cost-effective parallel reasoning while preserving performance.

More details can be found in our website

Results

📦Dependencies

cd DeepPrune  
pip install -r requirements.txt

⚠️ We use Llama-Factory for model fine-tuning and inference and we have provided the version we used in Llama-Factory folder. Here we modify it to support Focal Loss. Please refer to the GitHub issue if you want to clone LLaMA-Factory by yourself.

⚠️ We use Qwen/Qwen3-4B-Instruct-2507 as the backbone LLM for DeepPrune. You can download it from https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507. You can also use other open-source LLMs.

🧾 Dataset

⚠️ The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

To understand how to use the dataset, please refer to DeepPrune_data/README.md. Click here

🧪 Preliminaries

To understand the motivation behind DeepPrune, explore the preliminary analysis in:

📁 Preliminaries/Preliminary experiment.ipynb

This notebook includes:

📊 Distribution of answer agreement:
Most trace pairs yield the same answer, revealing significant redundancy in parallel reasoning.
📈 ROC curves for redundancy detection:
- Sentence-BERT (shallow similarity): AUROC = 0.58 → limited discriminative power.
- Qwen3-4B-Instruct (zero-shot LLM comparison): AUROC = 0.66 → moderate improvement, but still suboptimal.

🔧 To reproduce the zero-shot Qwen3-4B-Instruct results:

Prepare the evaluation dataset using DeepPrune/Offline/Ablation_Study.ipynb

Run Preliminaries/zero_shot_exp.py

🛠️ DeepPrune Pipeline

⚙️ Prerequisites

Install Llama-Factory
⚠️ Patch required: Modify the codebase to support Focal Loss (see GitHub issue for guidance).

1️⃣ Prepare Finetuning Dataset

Generate the supervised training data for DeepPrune:

jupyter notebook DeepPrune/finetuning/build_finetune_dataset.ipynb

This constructs pairwise trace comparisons labeled by answer equivalence.

2️⃣ Offline Training

Train the DeepPrune model using supervised fine-tuning:

Config: DeepPrune/Offline/Qwen3_full_sft.yaml
Framework: Llama-Factory

After training:

Generate test data: DeepPrune/Offline/Ablation_Study.ipynb

Evaluate performance:

python DeepPrune/Offline/test_model_performance_parallel.py

Visualize results: DeepPrune/Offline/check_model_output.ipynb

✅ Expect significant gains over shallow similarity baselines (AUROC > 0.83 in our experiments).

3️⃣ Online Pruning

Deploy DeepPrune for real-time trace pruning during inference:

Establish baselines:
Run DeepPrune/Online/check_pass_k.ipynb to compute:
- pass@1: Accuracy with single trace
- cons@512: Consensus accuracy with 512 traces
Apply DeepPrune:
```
python DeepPrune/Online/greedy_cluster_threshold.py
```
This performs greedy clustering of traces using DeepPrune’s similarity scores and prunes redundant ones.
Trade-off control:
Adjust the similarity threshold to balance:
- 💰 Cost reduction (fewer traces executed)
- 🎯 Performance retention (maintained consensus accuracy)

🙏 Acknowledgement

This code repository is developed based on Llama-Factory, vllm, DeepScaleR and DeepConf.

Thanks for their great work!

📜 Citation

If you use DeepPrune in your research, please cite our work:

@article{tu2025deepprune,
  title={DeepPrune: Parallel Scaling without Inter-trace Redundancy}, 
  author={Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li},
  journal={arXiv preprint arXiv:2510.08483},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy

📌 Overview

Results

📦Dependencies

🧾 Dataset

⚠️ The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

🧪 Preliminaries

🛠️ DeepPrune Pipeline

⚙️ Prerequisites

1️⃣ Prepare Finetuning Dataset

2️⃣ Offline Training

3️⃣ Online Pruning

🙏 Acknowledgement

📜 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
DeepPrune		DeepPrune
DeepPrune_data		DeepPrune_data
LLaMA-Factory		LLaMA-Factory
Preliminaries		Preliminaries
figs		figs
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

THU-KEG/DeepPrune

Folders and files

Latest commit

History

Repository files navigation

🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy

📌 Overview

Results

📦Dependencies

🧾 Dataset

⚠️ The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

🧪 Preliminaries

🛠️ DeepPrune Pipeline

⚙️ Prerequisites

1️⃣ Prepare Finetuning Dataset

2️⃣ Offline Training

3️⃣ Online Pruning

🙏 Acknowledgement

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages