No ONNX support for `scaled_dot_product_atten` #1752

drexalt · 2023-04-03T13:35:57Z

drexalt
Apr 3, 2023

When trying to export the Maxvit models trained on torch2.0 to ONNX, you run into the following error
Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 17 is not supported.
This unfortunately does not seem to be addressed in Opset 18 according to this issue

I've tried the custom opset operator from that issue, but outputs are not consistent with Timm torch outputs.

What is the best way to handle this? I could train the maxvit series on torch 1.13 to avoid using the scaled_dot_product_atten? Would timm want to incorporate something with a bit more fidelity?

Thanks!
Jonah

Answered by rwightman

Apr 3, 2023

@jturner116 hrmm, didn't notice the ONNX issue I don't understand why PyTorch always breaks things like this when they add new ops :/

Using the vit one as example, does it work if you do something like (not the is_tracing addition)?

        if self.fast_attn and not torch.jit.is_tracing():
            x = F.scaled_dot_product_attention(
                q, k, v,
                dropout_p=self.attn_drop.p,
            )
        else:
            q = q * self.scale
            attn = q @ k.transpose(-2, -1)
            attn = attn.softmax(dim=-1)
            attn = self.attn_drop(attn)
            x = attn @ v

View full answer

rwightman · 2023-04-03T15:07:43Z

rwightman
Apr 3, 2023
Maintainer

@jturner116 hrmm, didn't notice the ONNX issue I don't understand why PyTorch always breaks things like this when they add new ops :/

Using the vit one as example, does it work if you do something like (not the is_tracing addition)?

        if self.fast_attn and not torch.jit.is_tracing():
            x = F.scaled_dot_product_attention(
                q, k, v,
                dropout_p=self.attn_drop.p,
            )
        else:
            q = q * self.scale
            attn = q @ k.transpose(-2, -1)
            attn = attn.softmax(dim=-1)
            attn = self.attn_drop(attn)
            x = attn @ v

4 replies

drexalt Apr 3, 2023
Author

Yes, that works perfectly. Altered maxxvit.py both places where fast_attn is used.

if self.fast_attn and not torch.jit.is_tracing():
            if self.rel_pos is not None:
                attn_bias = self.rel_pos.get_bias()
            elif shared_rel_pos is not None:
                attn_bias = shared_rel_pos
            else:

Agree, they've reopened an issue in ONNX too here, hopefully gets addressed soon. Thanks for the quick fix :D

rwightman Apr 3, 2023
Maintainer

@jturner116 thanks for confirming, will add that soon and test a bit more

rwightman Apr 21, 2023
Maintainer

@jturner116 FYI the main branch has code to work around this for all models, bit different, it's done by using the exportable flag when creating the model as per https://github.com/huggingface/pytorch-image-models/blob/main/onnx_export.py#L71

The fused_attn (renamed from fast_attn) won't be enabled when that's set

drexalt Apr 21, 2023
Author

Perfect, that will be very nice. Thanks for keeping me in the loop, appreciate it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

No ONNX support for `scaled_dot_product_atten` #1752

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

No ONNX support for scaled_dot_product_atten #1752

Uh oh!

drexalt Apr 3, 2023

Replies: 1 comment · 4 replies

Uh oh!

rwightman Apr 3, 2023 Maintainer

Uh oh!

drexalt Apr 3, 2023 Author

Uh oh!

rwightman Apr 3, 2023 Maintainer

Uh oh!

rwightman Apr 21, 2023 Maintainer

Uh oh!

drexalt Apr 21, 2023 Author

No ONNX support for `scaled_dot_product_atten` #1752

drexalt
Apr 3, 2023

Replies: 1 comment 4 replies

rwightman
Apr 3, 2023
Maintainer

drexalt Apr 3, 2023
Author

rwightman Apr 3, 2023
Maintainer

rwightman Apr 21, 2023
Maintainer

drexalt Apr 21, 2023
Author