add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py #98

clouds1238 · 2025-12-23T14:39:51Z

No description provided.

CLAassistant · 2025-12-23T14:39:59Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

zhangboSJTU · 2025-12-29T03:37:42Z

Hello,

I happened to submit a PR implementing the same functionality shortly before yours, see #97. I heard that your PR has already passed both accuracy and performance tests, so I would like to use your implementation to run flashmask in PyTorch.

However, after pulling your PR and running pip setup.py install, I encountered a few issues:

There seems to be a missing init.py file with a version identifier.
The compiled package raises an undefined symbol error during import. I was able to work around this issue by setting DISABLE_HDIM64 = True.
At the moment, I am encountering another error (as shown in the screenshot below).

Would it be possible for you to update the current PR with a complete, runnable version of the code that can successfully pass the existing PR tests umiswing/test_flashmask#10 ?

Thank you very much for your time and effort, and I really appreciate your work on this PR.

clouds1238 · 2025-12-29T04:06:56Z

I've added the missing init.py file, Please take a look.

zhangboSJTU · 2025-12-29T06:23:59Z

I've added the missing init.py file, Please take a look.

How about the left errors? How can I solve them?

clouds1238 · 2025-12-29T07:42:51Z

I have updated flash_api_cuda.cu to fix the TMA stride configuration issue, and also fixed the path in setup.py.

Environment:

PyTorch: 2.7.1
CUDA: 12.6

Verification:
Compiled successfully with: pip install -e . 2>&1 | tee build.log

Ready for review.

zhangboSJTU · 2025-12-29T11:31:49Z

Still I pull the code which is ready for review and run the simplest test same as:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/functional/flash_attention.py#L1346
by adding the same code into the end line of the csrc/flashmask_v2/flashmask_interface.py file ：
After pip install -e . and I run python flashmask_interface.py. Then I got the same runtime error:

Error: Failed to initialize the TMA descriptor 700
CUDA error (/xxx/flash-attention/csrc/flashmask_v2/flash_fwd_launch_template.h:218): an illegal memory access was encountered

zhangboSJTU · 2025-12-30T09:24:10Z

csrc/flashmask_v2/__init__.py

@@ -0,0 +1 @@
+__version__ = "3.0.0.b1"


Suggested change

__version__ = "3.0.0.b1"

__version__ = "3.0.0b1"

zhangboSJTU · 2025-12-30T09:25:56Z

csrc/flashmask_v2/setup.py

+long_description=""
+# ninja build does not work unless include_dirs are abs path
+this_dir = os.path.dirname(os.path.abspath(__file__))
+


Suggested change

if not os.path.exists("instantiations"):

subprocess.run(["python", os.path.join(this_dir, "generate_kernels.py"), "-o", "instantiations"], check=True)

else:

print(f"Instantiations directory exists, skipping kernel generation.")

zhangboSJTU

LGTM. And it has already passed both accuracy and performance tests. Thanks a lot and I will close #97

zhangboSJTU · 2025-12-30T09:30:19Z

Still I pull the code which is ready for review and run the simplest test same as: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/functional/flash_attention.py#L1346 by adding the same code into the end line of the csrc/flashmask_v2/flashmask_interface.py file ： After pip install -e . and I run python flashmask_interface.py. Then I got the same runtime error:
Error: Failed to initialize the TMA descriptor 700
CUDA error (/xxx/flash-attention/csrc/flashmask_v2/flash_fwd_launch_template.h:218): an illegal memory access was encountered

I built paddlepaddle and ran the same test with padde. I found it will goto _C.flashmask function rather than _C.flashmask_v2function ...

clouds1238 force-pushed the feat-flashmask-interface branch from b1cb960 to e577e85 Compare December 23, 2025 14:45

clouds1238 force-pushed the feat-flashmask-interface branch from e577e85 to 3c43ad5 Compare December 29, 2025 04:01

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py

569d958

clouds1238 force-pushed the feat-flashmask-interface branch from 3c43ad5 to 569d958 Compare December 29, 2025 07:34

zhangboSJTU reviewed Dec 30, 2025

View reviewed changes

csrc/flashmask_v2/__init__.py

@@ -0,0 +1 @@

__version__ = "3.0.0.b1"

Copy link

zhangboSJTU Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

__version__ = "3.0.0.b1"

__version__ = "3.0.0b1"

zhangboSJTU reviewed Dec 30, 2025

View reviewed changes

zhangboSJTU mentioned this pull request Dec 30, 2025

Add torch api for flash mask v3 #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py #98

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py #98

Uh oh!

clouds1238 commented Dec 23, 2025

Uh oh!

CLAassistant commented Dec 23, 2025 •

edited

Loading

Uh oh!

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

Uh oh!

clouds1238 commented Dec 29, 2025

Uh oh!

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

Uh oh!

clouds1238 commented Dec 29, 2025

Uh oh!

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

Uh oh!

zhangboSJTU Dec 30, 2025

Uh oh!

zhangboSJTU Dec 30, 2025

Uh oh!

zhangboSJTU left a comment

Uh oh!

zhangboSJTU commented Dec 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+if not os.path.exists("instantiations"):
+    subprocess.run(["python", os.path.join(this_dir, "generate_kernels.py"), "-o", "instantiations"], check=True)
+else:
+    print(f"Instantiations directory exists, skipping kernel generation.")

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py #98

Are you sure you want to change the base?

add flashmask v2 torch flash_api.cpp flashmask_interface.py setup.py #98

Uh oh!

Conversation

clouds1238 commented Dec 23, 2025

Uh oh!

CLAassistant commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangboSJTU commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clouds1238 commented Dec 29, 2025

Uh oh!

zhangboSJTU commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clouds1238 commented Dec 29, 2025

Uh oh!

zhangboSJTU commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhangboSJTU Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

zhangboSJTU Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

zhangboSJTU left a comment

Choose a reason for hiding this comment

Uh oh!

zhangboSJTU commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Dec 23, 2025 •

edited

Loading

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

zhangboSJTU commented Dec 29, 2025 •

edited

Loading

zhangboSJTU commented Dec 30, 2025 •

edited

Loading