Skip to content

[SYCL] Use shared_ptr instead of manual changing UR counters #18565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Alexandr-Konovalov
Copy link
Contributor

@Alexandr-Konovalov Alexandr-Konovalov commented May 20, 2025

Keep kernel and program handle in a structure that lifetime is controlled by shared_ptr. This is faster wrt current implementation, because only one atomic operation is required for copying-destroying shared_ptr, while pair of kernel/program retain/release calls requires 2 atomic operations in the best case.

Keep kernel and program handle in a structure with lifetime controlled by
shared_ptr. This is faster wrt current implementation, because only one atomic
operation is required for copying-destroying shared_ptr, while pair of
kernel/program retain/release requires at least 2 atomic operations.

Signed-off-by: Alexandr Konovalov <[email protected]>
Signed-off-by: Alexandr Konovalov <[email protected]>
Signed-off-by: Alexandr Konovalov <[email protected]>
Signed-off-by: Alexandr Konovalov <[email protected]>
@vinser52 vinser52 requested a review from sergey-semenov May 26, 2025 08:31
@vinser52
Copy link
Contributor

vinser52 commented Jun 2, 2025

@intel/dpcpp-nativecpu-reviewers please review.

@vinser52 vinser52 added the performance Performance related issues label Jun 2, 2025
@hvdijk
Copy link
Contributor

hvdijk commented Jun 2, 2025

I'm not sure why NativeCPU needs to review this. The diff shows sycl/test/native_cpu/multiple_definitions.cpp being added, but that isn't from this PR, that's from #18726. What happened to cause it to show up here?

@Alexandr-Konovalov
Copy link
Contributor Author

I'm not sure why NativeCPU needs to review this. The diff shows sycl/test/native_cpu/multiple_definitions.cpp being added, but that isn't from this PR, that's from #18726. What happened to cause it to show up here?

Sorry, my mistake. Removed.

@vinser52 vinser52 removed the request for review from a team June 2, 2025 13:51

struct FastKernelCacheVal {
ur_kernel_handle_t MKernelHandle; /* UR kernel handle pointer. */
std::mutex *MMutex; /* Mutex guarding this kernel. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no, as we pass nullptr, when caching is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest making such a change in a separate PR if we agree it is needed. This PR just changes the ref counting schema for the cache, but not the data structure fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no, as we pass nullptr, when caching is disabled.

This is important, can you mention that in the in-code comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, no, as we pass nullptr, when caching is disabled.

This is important, can you mention that in the in-code comment?

Done.

const KernelArgMask *MKernelArgMask; /* Eliminated kernel argument mask. */
ur_program_handle_t MProgramHandle; /* UR program handle corresponding to
this kernel. */
const Adapter *MAdapterPtr; /* We can keep raw pointer to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

MKernelArgMask(KernelArgMask), MProgramHandle(ProgramHandle),
MAdapterPtr(AdapterPtr) {}

~FastKernelCacheVal() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set to nullptr after "Release"? It should be relatively cheap but can save lots of time debugging obscure errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +2729 to +2733
KernelCacheVal = detail::ProgramManager::getInstance().getOrCreateKernel(
ContextImpl, DeviceImpl, KernelName, KernelNameBasedCachePtr, NDRDesc);
Kernel = KernelCacheVal->MKernelHandle;
KernelMutex = KernelCacheVal->MMutex;
Program = KernelCacheVal->MProgramHandle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do we need to extract data like that? Would enabling structured bindings for this class make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in that particular code snippet, we cannot use structured bindings because the Kernel, KernelMutex, Program, and EliminatedArgMask variables are declared somewhere above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And additionally, reference counting is now supported by shared_ptr, so we have to keep shared_ptr, to not get race condition (use handler after decrementing counters).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't std::tie allow to use them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, but I do not think so. As I understand the KernelCacheVal should be convertible to std::tuple.
Anyway, let's keep it out of scope for this PR.

@aelovikov-intel
Copy link
Contributor

I haven't looked through this in details, so won't be LGTM'ing, but I'm not having any outstanding requests either. Feel free to merge once all is green.

@Alexandr-Konovalov
Copy link
Contributor Author

Colleagues @fabiomestre , @sergey-semenov, @againull , could it be merged or some further improvements are required?

@aelovikov-intel aelovikov-intel merged commit 47f5338 into intel:sycl Jun 2, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants