Low bit shampoo

Opening this on behalf of @winglian

An optimizer that many folks have been interested in is Shampoo https://arxiv.org/abs/1802.09568 - its fans say it converges faster because it uses second order gradients but still manages to keep memory requirements in check

To further keep the memory requirements in check we can quantize it! There are some existing papers out there that are good recipes for how this could work for int4 https://arxiv.org/abs/2405.18144

As far as implementing the work we have many reference examples for int8, int4, fp8 adam and adamw https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim and we have in progress contribution here https://github.com/pytorch/ao/pull/1231

Ideally the work above can be turned into a guide on how to implement a new low bit optimizer that people can follow and implement a new optimizer in a day's worth of work if they already understand how the optimizer they're trying to implement works

cc @gau-nernst @andrewor14 @vkuzo @janeyx99 @supriyar 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Low bit shampoo #1257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Low bit shampoo #1257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions