-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Labels
cudaCUDA adapter specific issuesCUDA adapter specific issues
Description
Building on top of intel/llvm#12604 + #1318 which adds handleOutOfResources
to dpcpp and returns UR_RESULT_ERROR_OUT_OF_RESOURCES
, the local mem size check:
unified-runtime/source/adapters/cuda/enqueue.cpp
Lines 294 to 298 in f086f36
if (LocalSize > static_cast<uint32_t>(Device->getMaxCapacityLocalMem())) { | |
setErrorMessage("Excessive allocation of local memory on the device", | |
UR_RESULT_ERROR_ADAPTER_SPECIFIC); | |
return UR_RESULT_ERROR_ADAPTER_SPECIFIC; | |
} |
should also return
UR_RESULT_ERROR_OUT_OF_RESOURCES
and have dedicated error handling case added in handleOutOfResources
.
Right now submitting a kernel with too large local mem size results in:
Native API failed. Native API returns: -996 (The plugin has emitted a backend specific error)
Excessive allocation of local memory on the device
-996 (The plugin has emitted a backend specific error)
which does contain a helpful exception message, but wrapped in generic and confusing "backend specific error" messages and the unhelpful code -996. Having this returning ERROR_OUT_OF_RESOURCES
would make it easier for us to cover in the troubleshooting guide, and for users to find it with web search engines.
Metadata
Metadata
Assignees
Labels
cudaCUDA adapter specific issuesCUDA adapter specific issues