-
Notifications
You must be signed in to change notification settings - Fork 308
Open
Labels
Description
Background
Current Wanda (torchao/sparsity/wanda.py
) uses magnitude-based pruning criteria: |weight| * ||activation||
(arXiv, fig1)

Feature : Wanda++
And recently, Wanda++ (arXiv) was published. The main difference can be summarized like the following:
- Regional Gradient Score (RGS): Uses block-level gradients instead of magnitude
- Regional Optimization (RO): Block-level weight fine-tuning after pruning (iterately)
How about expanding Wanda into Wanda++? I am not certain if I can handle full RO algorithm, but Table 1 shows significant improvements even with RGS.
