Skip to content

Conversation

@clouds1238
Copy link

No description provided.

@CLAassistant
Copy link

CLAassistant commented Dec 23, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@clouds1238 clouds1238 force-pushed the feat-flashmask-interface branch from b1cb960 to e577e85 Compare December 23, 2025 14:45
@zhangboSJTU
Copy link

zhangboSJTU commented Dec 29, 2025

Hello,

I happened to submit a PR implementing the same functionality shortly before yours, see #97. I heard that your PR has already passed both accuracy and performance tests, so I would like to use your implementation to run flashmask in PyTorch.

However, after pulling your PR and running pip setup.py install, I encountered a few issues:

  • There seems to be a missing init.py file with a version identifier.

  • The compiled package raises an undefined symbol error during import. I was able to work around this issue by setting DISABLE_HDIM64 = True.

  • At the moment, I am encountering another error (as shown in the screenshot below).

image

Would it be possible for you to update the current PR with a complete, runnable version of the code that can successfully pass the existing PR tests umiswing/test_flashmask#10 ?

Thank you very much for your time and effort, and I really appreciate your work on this PR.

@clouds1238 clouds1238 force-pushed the feat-flashmask-interface branch from e577e85 to 3c43ad5 Compare December 29, 2025 04:01
@clouds1238
Copy link
Author

I've added the missing init.py file, Please take a look.

@zhangboSJTU
Copy link

zhangboSJTU commented Dec 29, 2025

I've added the missing init.py file, Please take a look.

How about the left errors? How can I solve them?

@clouds1238 clouds1238 force-pushed the feat-flashmask-interface branch from 3c43ad5 to 569d958 Compare December 29, 2025 07:34
@clouds1238
Copy link
Author

I have updated flash_api_cuda.cu to fix the TMA stride configuration issue, and also fixed the path in setup.py.

Environment:

  • PyTorch: 2.7.1
  • CUDA: 12.6

Verification:
Compiled successfully with: pip install -e . 2>&1 | tee build.log

Ready for review.

@zhangboSJTU
Copy link

zhangboSJTU commented Dec 29, 2025

Still I pull the code which is ready for review and run the simplest test same as:
https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/functional/flash_attention.py#L1346
by adding the same code into the end line of the csrc/flashmask_v2/flashmask_interface.py file :
After pip install -e . and I run python flashmask_interface.py. Then I got the same runtime error:

Error: Failed to initialize the TMA descriptor 700
CUDA error (/xxx/flash-attention/csrc/flashmask_v2/flash_fwd_launch_template.h:218): an illegal memory access was encountered

@@ -0,0 +1 @@
__version__ = "3.0.0.b1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
__version__ = "3.0.0.b1"
__version__ = "3.0.0b1"

long_description=""
# ninja build does not work unless include_dirs are abs path
this_dir = os.path.dirname(os.path.abspath(__file__))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not os.path.exists("instantiations"):
subprocess.run(["python", os.path.join(this_dir, "generate_kernels.py"), "-o", "instantiations"], check=True)
else:
print(f"Instantiations directory exists, skipping kernel generation.")

Copy link

@zhangboSJTU zhangboSJTU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. And it has already passed both accuracy and performance tests. Thanks a lot and I will close #97

@zhangboSJTU
Copy link

zhangboSJTU commented Dec 30, 2025

Still I pull the code which is ready for review and run the simplest test same as: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/nn/functional/flash_attention.py#L1346 by adding the same code into the end line of the csrc/flashmask_v2/flashmask_interface.py file : After pip install -e . and I run python flashmask_interface.py. Then I got the same runtime error:

Error: Failed to initialize the TMA descriptor 700
CUDA error (/xxx/flash-attention/csrc/flashmask_v2/flash_fwd_launch_template.h:218): an illegal memory access was encountered

I built paddlepaddle and ran the same test with padde. I found it will goto _C.flashmask function rather than _C.flashmask_v2function ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants