[Feature] Wanda++ for LM head pruning

### Background
Current Wanda (`torchao/sparsity/wanda.py`) uses magnitude-based pruning criteria: `|weight| * ||activation||` ([arXiv](https://arxiv.org/abs/2306.11695), fig1)

<img width="600" height="300" alt="Image" src="https://github.com/user-attachments/assets/b660d9a5-44b7-44db-88c7-e5d3af77d877" />


### Feature : Wanda++

And recently, Wanda++ ([arXiv](https://arxiv.org/abs/2503.04992)) was published. The main difference can be summarized like the following:
- **Regional Gradient Score (RGS)**: Uses block-level gradients instead of magnitude
- **Regional Optimization (RO)**: Block-level weight fine-tuning after pruning (iterately)

How about expanding Wanda into Wanda++? I am not certain if I can handle full RO algorithm, but Table 1 shows significant improvements even with RGS.

<img width="769" height="465" alt="Image" src="https://github.com/user-attachments/assets/b8186195-c2e6-4325-86b0-7034a54ce59b" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Wanda++ for LM head pruning #2517

Background

Feature : Wanda++

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Wanda++ for LM head pruning #2517

Description

Background

Feature : Wanda++

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions