Skip to content

THU-KEG/DeepPrune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy

📃 Paper✈️ Project Page • 🤗 Model • 🤗 Data

Efficient reasoning at scale by pruning redundant reasoning traces—without sacrificing accuracy.


📌 Overview

Large language models (LLMs) often generate multiple reasoning traces in parallel to improve answer reliability. However, these traces frequently exhibit severe inter-trace redundancy, leading to wasted computation and inflated inference costs.

DeepPrune addresses this by learning to identify and prune semantically redundant traces before full execution—enabling cost-effective parallel reasoning while preserving performance.

More details can be found in our website

Results

alt text


📦Dependencies

cd DeepPrune  
pip install -r requirements.txt

⚠️ We use Llama-Factory for model fine-tuning and inference and we have provided the version we used in Llama-Factory folder. Here we modify it to support Focal Loss. Please refer to the GitHub issue if you want to clone LLaMA-Factory by yourself.

⚠️ We use Qwen/Qwen3-4B-Instruct-2507 as the backbone LLM for DeepPrune. You can download it from https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507. You can also use other open-source LLMs.


🧾 Dataset

⚠️ The dataset provided here is not complete because of the size! ! ! Please refer to https://huggingface.co/datasets/THU-KEG/DeepPrune for the full dataset.

To understand how to use the dataset, please refer to DeepPrune_data/README.md. Click here


🧪 Preliminaries

To understand the motivation behind DeepPrune, explore the preliminary analysis in:

📁 Preliminaries/Preliminary experiment.ipynb

This notebook includes:

  • 📊 Distribution of answer agreement:
    Most trace pairs yield the same answer, revealing significant redundancy in parallel reasoning.

  • 📈 ROC curves for redundancy detection:

    • Sentence-BERT (shallow similarity): AUROC = 0.58 → limited discriminative power.
    • Qwen3-4B-Instruct (zero-shot LLM comparison): AUROC = 0.66 → moderate improvement, but still suboptimal.

🔧 To reproduce the zero-shot Qwen3-4B-Instruct results:

  1. Prepare the evaluation dataset using DeepPrune/Offline/Ablation_Study.ipynb
  2. Run Preliminaries/zero_shot_exp.py

🛠️ DeepPrune Pipeline

⚙️ Prerequisites

  • Install Llama-Factory
  • ⚠️ Patch required: Modify the codebase to support Focal Loss (see GitHub issue for guidance).

1️⃣ Prepare Finetuning Dataset

Generate the supervised training data for DeepPrune:

jupyter notebook DeepPrune/finetuning/build_finetune_dataset.ipynb

This constructs pairwise trace comparisons labeled by answer equivalence.


2️⃣ Offline Training

Train the DeepPrune model using supervised fine-tuning:

  • Config: DeepPrune/Offline/Qwen3_full_sft.yaml
  • Framework: Llama-Factory

After training:

  1. Generate test data: DeepPrune/Offline/Ablation_Study.ipynb
  2. Evaluate performance:
    python DeepPrune/Offline/test_model_performance_parallel.py
  3. Visualize results: DeepPrune/Offline/check_model_output.ipynb

✅ Expect significant gains over shallow similarity baselines (AUROC > 0.83 in our experiments).


3️⃣ Online Pruning

Deploy DeepPrune for real-time trace pruning during inference:

  1. Establish baselines:
    Run DeepPrune/Online/check_pass_k.ipynb to compute:

    • pass@1: Accuracy with single trace
    • cons@512: Consensus accuracy with 512 traces
  2. Apply DeepPrune:

    python DeepPrune/Online/greedy_cluster_threshold.py

    This performs greedy clustering of traces using DeepPrune’s similarity scores and prunes redundant ones.

  3. Trade-off control:
    Adjust the similarity threshold to balance:

    • 💰 Cost reduction (fewer traces executed)
    • 🎯 Performance retention (maintained consensus accuracy)

🙏 Acknowledgement

This code repository is developed based on Llama-Factory, vllm, DeepScaleR and DeepConf.

Thanks for their great work!


📜 Citation

If you use DeepPrune in your research, please cite our work:

@article{tu2025deepprune,
  title={DeepPrune: Parallel Scaling without Inter-trace Redundancy}, 
  author={Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li},
  journal={arXiv preprint arXiv:2510.08483},
  year={2025}
}

About

🌿 DeepPrune: Parallel Scaling without Inter-trace Redundancy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •