[SYCL] Reduce urKernelRetain, Release calls when not using kernel bundle or RTC #18324

uditagarwal97 · 2025-05-05T20:49:50Z

Problem
Every time we fetch a kernel/program from in-memory cache, we make a call to urKernelRetain/urKernelRelease/urProgramRelease/urProgramRetain. These calls are expensive.

Solution (Proposed in this PR)

Case 1 (When in-memory cache is enabled):

Instead of a shared ownership, transfer the ownership of kernel/program handles to the cache. Cache manages the lifetime of program/kernel handle and so there are no more calls to urKernelRetain/Release when fetching items from cache. During program shutdown, cache will be destroyed, and the kernel, program objects it holds will be also released.

(a) Caveat: (When eviction is used or cache has to be cleared due to OOM)
Consider the case when one thread is using the kernel/program handle fetched from cache and another thread just evicted that item from cache, perhaps due to OOM. In this case, we might run into a use-after-free bug. To prevent that, this PR, guards cache access with a reader, writer lock. Threads fetching from cache will hold a reader lock which will be release when kernel/program handles are no longer needed. Thread trying to evict from cache will hold a writer lock.

Case 2 (When in-memory cache is explicitly disabled):

When cache is disabled, there no transfer of ownership to cache (obviously) and the code will work just like it works right now.

…dle or RTC

uditagarwal97 · 2025-05-28T05:15:10Z

@Alexandr-Konovalov FYI. PR is ready for review.

uditagarwal97 · 2025-05-28T05:17:46Z

Performance: https://intel.github.io/llvm/benchmarks/?runs=Baseline_PVC_L0%2CPR18324_PVC_L0 (max 3-4% improvement)

vinser52

Did I understand correctly that the RW lock you introduced is per the whole cache, not per cache entry? How is this approach compared to #18565?

vinser52 · 2025-05-28T09:08:30Z

sycl/source/detail/kernel_program_cache.hpp

+  // (6) The writer is only allowed to delete an entry, we anyway do not support
+  //     modyfying an entry in the cache.
+
+  void acquireReaderLock() {


Why do you need to invent your own RW lock?

IIUC, the closest to reader/writer lock in C++ is shared_lock (https://en.cppreference.com/w/cpp/thread/shared_lock.html) and unique_lock. Both of these operate over a mutex. In my implementation, I've used an atomic variable instead, which I suppose will be faster than mutex here as contention between threads is low (w'll evict from cache rarely). In my understanding, for simple atomic counter-like applications, std::atomic performs better than mutex as the former can leverage HW support for atomic ops while mutex would also require OS support (like futex syscall on Linux?).

vinser52 · 2025-05-28T09:18:51Z

sycl/source/detail/scheduler/commands.cpp

@@ -2703,6 +2703,8 @@ void enqueueImpKernel(

  std::shared_ptr<kernel_impl> SyclKernelImpl;
  std::shared_ptr<device_image_impl> DeviceImageImpl;
+  // Transfer ownership only of cache is enabled.
+  const bool TransferownerShipToCache = SYCLConfig<SYCL_CACHE_IN_MEM>::get();


Should it be TransferOwnershipToCache instead of TransferownerShipToCache?

vinser52 · 2025-05-28T09:36:29Z

sycl/source/detail/scheduler/commands.cpp

@@ -2703,6 +2703,8 @@ void enqueueImpKernel(

  std::shared_ptr<kernel_impl> SyclKernelImpl;
  std::shared_ptr<device_image_impl> DeviceImageImpl;
+  // Transfer ownership only of cache is enabled.
+  const bool TransferownerShipToCache = SYCLConfig<SYCL_CACHE_IN_MEM>::get();


Also I think it breaks encapsulation: the cache config should be read inside the cache implementation, not by the caller of the cache

vinser52 · 2025-05-28T09:42:14Z

Performance: https://intel.github.io/llvm/benchmarks/?runs=Baseline_PVC_L0%2CPR18324_PVC_L0 (max 3-4% improvement)

we need to test with the v2 adapter.

vinser52 · 2025-05-28T16:01:32Z

As we agreed at the meeting, we will proceed with the approach in #18565.

uditagarwal97 self-assigned this May 5, 2025

uditagarwal97 temporarily deployed to WindowsCILock May 5, 2025 20:50 — with GitHub Actions Inactive

uditagarwal97 had a problem deploying to WindowsCILock May 5, 2025 21:14 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock May 27, 2025 18:04 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock May 27, 2025 18:32 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock May 27, 2025 18:57 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock May 27, 2025 19:11 — with GitHub Actions Failure

uditagarwal97 temporarily deployed to WindowsCILock May 27, 2025 19:42 — with GitHub Actions Inactive

uditagarwal97 had a problem deploying to WindowsCILock May 27, 2025 20:10 — with GitHub Actions Failure

uditagarwal97 had a problem deploying to WindowsCILock May 28, 2025 02:52 — with GitHub Actions Error

[SYCL] Reduce urKernelRetain, Release calls when not using kernel bun…

189534d

…dle or RTC

uditagarwal97 force-pushed the udit/remove_ur_kernel_retain_call branch from 5248a81 to 189534d Compare May 28, 2025 03:14

uditagarwal97 temporarily deployed to WindowsCILock May 28, 2025 03:14 — with GitHub Actions Inactive

uditagarwal97 temporarily deployed to WindowsCILock May 28, 2025 03:42 — with GitHub Actions Inactive

uditagarwal97 added 2 commits May 27, 2025 22:03

Fix XPTI test

6351aa1

Implement reader-writer lock to prevent race in cache

e882d0d

uditagarwal97 had a problem deploying to WindowsCILock May 28, 2025 05:11 — with GitHub Actions Error

uditagarwal97 requested a review from sergey-semenov May 28, 2025 05:15

uditagarwal97 marked this pull request as ready for review May 28, 2025 05:15

uditagarwal97 requested a review from a team as a code owner May 28, 2025 05:15

uditagarwal97 requested a review from vinser52 May 28, 2025 05:39

vinser52 requested changes May 28, 2025

View reviewed changes

vinser52 reviewed May 28, 2025

View reviewed changes

uditagarwal97 closed this May 28, 2025

uditagarwal97 deleted the udit/remove_ur_kernel_retain_call branch May 28, 2025 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Reduce urKernelRetain, Release calls when not using kernel bundle or RTC #18324

[SYCL] Reduce urKernelRetain, Release calls when not using kernel bundle or RTC #18324

Uh oh!

uditagarwal97 commented May 5, 2025 •

edited

Loading

Uh oh!

uditagarwal97 commented May 28, 2025

Uh oh!

uditagarwal97 commented May 28, 2025 •

edited

Loading

Uh oh!

vinser52 left a comment

Uh oh!

vinser52 May 28, 2025

Uh oh!

uditagarwal97 May 28, 2025

Uh oh!

vinser52 May 28, 2025

Uh oh!

vinser52 May 28, 2025

Uh oh!

vinser52 commented May 28, 2025

Uh oh!

vinser52 commented May 28, 2025

Uh oh!

Uh oh!

[SYCL] Reduce urKernelRetain, Release calls when not using kernel bundle or RTC #18324

[SYCL] Reduce urKernelRetain, Release calls when not using kernel bundle or RTC #18324

Uh oh!

Conversation

uditagarwal97 commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

uditagarwal97 commented May 28, 2025

Uh oh!

uditagarwal97 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

uditagarwal97 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

vinser52 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

vinser52 May 28, 2025

Choose a reason for hiding this comment

Uh oh!

vinser52 commented May 28, 2025

Uh oh!

vinser52 commented May 28, 2025

Uh oh!

Uh oh!

uditagarwal97 commented May 5, 2025 •

edited

Loading

uditagarwal97 commented May 28, 2025 •

edited

Loading