Commit 7cc93cf
authored
[webgpu] Apply Flash Attention if sliding window exceeds KV cache length (#25594)
### Description
<!-- Describe your changes. -->
#25372 adds sliding window support for Group Query Attention, disabling
Flash Attention as it's not yet supported.
This PR adds a check for the sliding window and applies Flash Attention
when the window size exceeds the KV cache length or total sequence
length.
### Motivation and Context
See above.1 parent a120b4b commit 7cc93cf
1 file changed
+4
-1
lines changedLines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
198 | 198 | | |
199 | 199 | | |
200 | 200 | | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
201 | 204 | | |
202 | 205 | | |
203 | | - | |
| 206 | + | |
204 | 207 | | |
205 | 208 | | |
206 | 209 | | |
| |||
0 commit comments