flashinfer-ai / flashinfer Public

Notifications You must be signed in to change notification settings
Fork 319
Star 3.1k

Code
Issues 112
Pull requests 13
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: flashinfer-ai/flashinfer

[Roadmap] FlashInfer v0.2 to v0.3

#675 opened Dec 17, 2024 by yzh119

Open 6

Deprecation Notice: Python 3.8 Wheel Support to End in future...

#682 opened Dec 18, 2024 by yzh119

Open 2

[Feature Request] Llama 4

#1004 opened Apr 6, 2025 by yzh119

Open 1

Beta

Labels 16 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

112 Open 174 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Typo in the Paper

#1098 opened May 29, 2025 by kingzevin

How to combine API and implement same function of 'torch.nn.functional.scaled_dot_product_attention'

#1095 opened May 27, 2025 by alao556

Question about smem descriptor in wgmma

#1078 opened May 21, 2025 by botbw

test_sampling.cu is not updated to the newer sampling kernel interface

#1074 opened May 20, 2025 by 842974287

FA3 FP8 KV8 missing configuration for Llama 1B head dims

#1068 opened May 19, 2025 by nandor

AOT Refactor

#1064 opened May 19, 2025 by abcdabcd987

9 of 10 tasks

C++ test example failed for test_single_prefill bug

Something isn't working

#1057 opened May 14, 2025 by swmobile

Build fail because of "unknown" in metadata while installation

#1053 opened May 11, 2025 by IzhanVarsky

[BUG] Support 32 head size for SGLang

#1048 opened May 7, 2025 by DavidBao03

Prefill output corrupted by unrelated previous operations, "Barrier error detected. Missing wait" sanitizer error

#1046 opened May 1, 2025 by danieldjohnson

Problem when using attention variant with custom mask

#1044 opened Apr 29, 2025 by xiaozxiong

Does flashinfer support head_size = 576 for Ampere GPUs?

#1043 opened Apr 28, 2025 by ghostplant

flashinfer.jit: Loading JIT ops: batch_decode_with_kv_cache_dtype_q_f16_dtype_kv_f16_dtype_o_f16_dtype_idx_i32_head_dim_qk_64_head_dim_vo_64_posenc_0_use_swa_False_use_logits_cap_False

#1038 opened Apr 24, 2025 by yawnzh

Which one of FlashMLA is faster on Hopper?

#1037 opened Apr 24, 2025 by ghostplant

Can flashinfer's CutlassSegmentGEMMSM90Run function be used for LoRA computing on H20?

#1034 opened Apr 23, 2025 by chenhongyu2048

flashinfer.decode.single_decode_with_kv_cache: Floating point exception (core dumped)

#1027 opened Apr 20, 2025 by MenHimChan

[Performance Issue] FlashInfer shows no performance improvement with FP8 compared to BF16 in BatchDecodeWithPagedKVCacheWrapper with page_size=1

#1024 opened Apr 18, 2025 by cscyuge

[Bug] FP8 scaling factors (k_scale/v_scale) not taking effect in BatchPrefillWithPagedKVCacheWrapper

#1023 opened Apr 17, 2025 by cscyuge

Low performance of POD Attention compared to BatchPrefillWithPagedKVCache

#1022 opened Apr 17, 2025 by Edenzzzz

[Feature Request] Llama 4

#1004 opened Apr 6, 2025 by yzh119

3 tasks

Does FlashInfer support ViT attention in Qwen2.5_Vl?

#992 opened Apr 1, 2025 by IdeaMeshDyx

[Feature Tracking] MLA-FP8 Hopper kernel

#990 opened Mar 31, 2025 by yzh119

C++ API Stability

#988 opened Mar 30, 2025 by AgrawalAmey

Unclear behavior of top_k for k < 1

#979 opened Mar 28, 2025 by sharvil

top_k_top_p_sampling_from_logits incompatible with torch.compile + CUDAGraph

#978 opened Mar 28, 2025 by sharvil

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!