Removed redundant templates and related compile-time/runtime code #91
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#87 的 reopen 版本,消除了 #89 的冲突。本 PR 还包含了 #86 的 code,所以 #86 已经 closed,本 PR 一起全部进行了测试(同时包含 #81 )。#86 相关的优化见 #86 的 PR 描述。
初步简化了 FMv3 的模板表达:
Split相关逻辑(包括简化了 PPT/DualPPTX 的多余 fast_divmod 模块)Is_flashmaskbool template argIntraWGOverlapbool template arg,默认一定 Truebenchmark除了 seqlen = 128 有所提升(转静态调度)之外其他配置的性能没有变化,正确性已经通过测试(逐位对齐)。
为了不引起前序未合入 PR 冲突,本 PR 应该在 #81, #86 合入后合入。#81 合入需要手动解冲突,#86 合入后需要 rebase。
大幅简化了
tile_scheduler.h,删除了不必要的实现,将公共部分用基类管理。PPT 增加了步长设置,某些 mask 类型利用 Stride 是有利的。TODO