Accelerate activation sparsity with activation compression

We've come up with a training recipe for 2:4 activation sparsity, which is outlined in this paper: https://openreview.net/pdf?id=O5feVk7p6Y

The gist of this approach is that:

1) we find high level of activation sparsity (> 85%) when training SquaredRELU based FFNs instead of SwiGLU FFNs. These Squared-RELU based FFNs show minimal to no accuracy loss. 
2) We accelerate the sparse activation x dense weight matmul with 2:4 sparsity. We can naively sparsity for the forwards pass, dropping values to fit the 2:4 constraint if they do not fit. For the backwards pass, we need some special sauce to mantain accuraccy. 

However @janeyx99 pointed out to me that instead of accelerating the model using 2:4 sparsity, we can seek to exploit (1) with activation compression instead. The idea here is that we can use something like [nvcomp](https://developer.nvidia.com/nvcomp) to compress the sparse squared-relu activations. 

We should run some tests to know what compression ratio and thus the memory savings we could achieve, as well as if there's additional overhead for the compression to account for. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerate activation sparsity with activation compression #1920

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accelerate activation sparsity with activation compression #1920

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions