Skip to content

Low bit shampoo #1257

Open
Open
@msaroufim

Description

@msaroufim

Opening this on behalf of @winglian

An optimizer that many folks have been interested in is Shampoo https://arxiv.org/abs/1802.09568 - its fans say it converges faster because it uses second order gradients but still manages to keep memory requirements in check

To further keep the memory requirements in check we can quantize it! There are some existing papers out there that are good recipes for how this could work for int4 https://arxiv.org/abs/2405.18144

As far as implementing the work we have many reference examples for int8, int4, fp8 adam and adamw https://github.com/pytorch/ao/tree/main/torchao/prototype/low_bit_optim and we have in progress contribution here #1231

Ideally the work above can be turned into a guide on how to implement a new low bit optimizer that people can follow and implement a new optimizer in a day's worth of work if they already understand how the optimizer they're trying to implement works

cc @gau-nernst @andrewor14 @vkuzo @janeyx99 @supriyar

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions