Fix issue in recompiling kernel with double GRF mode.#4287
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a shadowing issue with the debug flag and ensures that the rebuilt kernel in large GRF mode is correctly returned, with proper cleanup and final logging.
- Extracts the
debugEnabledoptional once at function scope to avoid shadowing in the try block - Renames the inner compile results to
*_dgrfvariants and swaps them with the outer values when beneficial - Adds cleanup of the unused Level Zero module/kernel and a final debug log after recompilation
Comments suppressed due to low confidence (3)
third_party/intel/backend/driver.c:240
- Checking the optional
debugEnabledwithif (debugEnabled)only tests for presence of a value, not its truth. Useif (debugEnabled.value_or(false))to correctly guard debug logs.
if (debugEnabled)
third_party/intel/backend/driver.c:230
- [nitpick] Variable
debugEnableduses camelCase, which is inconsistent with the snake_case naming (e.g.,n_spills) used elsewhere in this file. Consider renaming todebug_enabledfor consistency.
const std::optional<bool> debugEnabled =
third_party/intel/backend/driver.c:239
- The new recompilation branch for large GRF mode isn’t covered by existing tests. Add a unit or integration test that simulates
n_spills > max_reg_spillto verify the fallback and cleanup behavior.
if (!is_GRF_mode_specified && n_spills > max_reg_spill) {
|
I am confused - looking through the code right now it looks like the automatic failover for the case when number of spills exceeds a certain threshold is not working because new I also think we need to find a way to unit test this code - especially if it is true that it has been broken all this time. |
| std::cout << "(I): Kernel has now " << n_spills << " spills" | ||
| std::cout << "(I): Kernel has now " << n_spills_dgrf << " spills" | ||
| << std::endl; | ||
| if (n_spills_dgrf < n_spills) { |
There was a problem hiding this comment.
Under what circumstances would this happen?
There was a problem hiding this comment.
I would think in most cases number of spills should be less with double GRF, e.g., GEMM.
There was a problem hiding this comment.
For most case the condition is true always. But it is just for some conner case we don't know yet.
There was a problem hiding this comment.
I don’t think we should be adding this logic for corner cases we don’t know about. For one thing, we can’t test it. For another, it introduces a bunch of operations that involve containers we don’t control and can’t introspect. This opens us up to gnarly bugs in the future.
There was a problem hiding this comment.
Make sense. We'd better not to over protect the in-possible case.
|
It looks to me that it didn't work, yup.
I think that would be the case even without the new behavior. I would only question the check. |
|
Any performance impact on Triton Benchmarks ? |
b4c0fca to
ed3605b
Compare
I kick off a benchmark action. https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/15248204645 |
Ok. Please confirm whether there are performance degradations or not. |
|
I still don’t understand if the current automatic large GRF mode is currently broken, or not. If it’s broken then we should see some impact in the benchmarks, no? Regardless I don’t think we should land this until we have a test included so we don’t go through this again. |
There is no performance regression with the PR. Observe Performance improvement on two benchmarks: The main reason is that the double GRF is not enabled in those two benchmarks original. |
The current automatic large GRF mode is broken. We observed performance increased of two benchmark which is not enable the double GRF by default. |
ed3605b to
ccda9d2
Compare
Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>
alexbaden
left a comment
There was a problem hiding this comment.
Thanks for adding the test and clarifying the problem - this all makes sense now, the code block handling more spills under double GRF was confusing me.
It is interesting that we didn't see much performance impact from breaking this change in the benchmarks. I suppose that is because we set the GRF mode explicitly as part of the auto-tuner config in most of the benchmarks. On the NVIDIA side the number of registers used is returned as a tuning parameter for upstream libraries (e.g. Inductor) to use. I wonder if we could re-work this feature to do something similar. Do other libraries (e.g. vLLM) use n_regs returned by NVIDIA as a tuning parameter?
| std::cout << "(I): Kernel has now " << n_spills_dgrf << " spills" | ||
| << std::endl; | ||
|
|
||
| std::swap(l0_module, l0_module_dgrf); |
There was a problem hiding this comment.
I don't think this swap is a good idea. It would be better to encapsulate everything in a struct and use RAII to destroy, vs the swap and manual destroy. If we want to make an issue and do this as a follow-up I would be ok with that, though.
|
|
||
| // clean up the unused module and kernel. | ||
| auto error_no = zeKernelDestroy(l0_kernel_dgrf); | ||
| if (error_no != ZE_RESULT_SUCCESS) { |
There was a problem hiding this comment.
Let's return the error code with the error message.
There was a problem hiding this comment.
Let's use this issue to track this #4334
We can discus about the idea in tech meeting. From my aspect, I think the |
|
Right, we can't just return |


The name used in try-catch block shadows the names in outside block. Failed to return the new kernel on exit of the try-catch block. Use the new name inside the try-catch block can fix the issue.