Skip to content

Is sparse attention implemented? #458

@Da1sypetals

Description

@Da1sypetals

🚀 The feature, motivation and pitch

Is sparse attention implemented?

What I mean by sparse attention is that $q, k, v$ are dense, but the attention mask is represented in COO or CSR format, and most importantly the attention score matrix is not materialized in a dense form (which sometimes does not fit into VRAM).

I did searched on the PyG library and this library but did not find any. Correct me if there is an existing implementation.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions