Skip to content

[UR][SYCL] Add urUSMContextMemcpyExp API to enable device global support. #17268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 5, 2025
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
db41224
[UR] Add urUSMContextMemcpyExp API and basic l0 implementation.
aarongreig Mar 3, 2025
6dd8372
Add cts tests.
aarongreig Mar 7, 2025
acf53ad
Fix typo
aarongreig Mar 7, 2025
7629c48
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 7, 2025
43a3123
Add missing newline
aarongreig Mar 7, 2025
653c5b3
Correct linkage of l0 implementation.
aarongreig Mar 7, 2025
505c7ce
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 11, 2025
282b3d2
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 13, 2025
3802c4f
Add missing entry for l0v2.
aarongreig Mar 13, 2025
f702bb1
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 14, 2025
1a4e90c
Use new API to avoid temporary queue in ext_oneapi_get_device_global_…
aarongreig Mar 14, 2025
56c192a
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 14, 2025
874df40
Add l0 v2 implementation.
aarongreig Mar 14, 2025
2fb988f
Add back deleted newline.
aarongreig Mar 14, 2025
04c9156
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 17, 2025
ed29b77
Address review feedback.
aarongreig Mar 17, 2025
7a08d66
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 17, 2025
e034023
Fix bad merge.
aarongreig Mar 17, 2025
27c9cb6
actually fix hip this time
aarongreig Mar 17, 2025
55ce78a
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 21, 2025
c88f4f9
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 26, 2025
64a2c91
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 27, 2025
d1c5f88
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Mar 31, 2025
ea0e9b9
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Apr 11, 2025
573f8f6
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Jun 3, 2025
6c2eb44
Correct for some recent updates.
aarongreig Jun 3, 2025
7cbd224
Merge branch 'sycl' into aaron/usmContextMemcpy
aarongreig Jun 4, 2025
bb64e3a
Fix test cmake.
aarongreig Jun 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions sycl/source/detail/device_global_map_entry.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,48 @@ DeviceGlobalUSMMem &DeviceGlobalMapEntry::getOrAllocateDeviceGlobalUSM(
return NewAlloc;
}

DeviceGlobalUSMMem &
DeviceGlobalMapEntry::getOrAllocateDeviceGlobalUSM(const context &Context) {
assert(!MIsDeviceImageScopeDecorated &&
"USM allocations should not be acquired for device_global with "
"device_image_scope property.");
const std::shared_ptr<context_impl> &CtxImpl = getSyclObjImpl(Context);
const std::shared_ptr<device_impl> &DevImpl =
getSyclObjImpl(CtxImpl->getDevices().front());
std::lock_guard<std::mutex> Lock(MDeviceToUSMPtrMapMutex);

auto DGUSMPtr = MDeviceToUSMPtrMap.find({DevImpl.get(), CtxImpl.get()});
if (DGUSMPtr != MDeviceToUSMPtrMap.end())
return DGUSMPtr->second;

void *NewDGUSMPtr = detail::usm::alignedAllocInternal(
0, MDeviceGlobalTSize, CtxImpl.get(), DevImpl.get(),
sycl::usm::alloc::device);

auto NewAllocIt = MDeviceToUSMPtrMap.emplace(
std::piecewise_construct,
std::forward_as_tuple(DevImpl.get(), CtxImpl.get()),
std::forward_as_tuple(NewDGUSMPtr));
assert(NewAllocIt.second &&
"USM allocation for device and context already happened.");
DeviceGlobalUSMMem &NewAlloc = NewAllocIt.first->second;

// C++ guarantees members appear in memory in the order they are declared,
// so since the member variable that contains the initial contents of the
// device_global is right after the usm_ptr member variable we can do
// some pointer arithmetic to memcopy over this value to the usm_ptr. This
// value inside of the device_global will be zero-initialized if it was not
// given a value on construction.
MemoryManager::context_copy_usm(
reinterpret_cast<const void *>(
reinterpret_cast<uintptr_t>(MDeviceGlobalPtr) +
sizeof(MDeviceGlobalPtr)),
CtxImpl, MDeviceGlobalTSize, NewAlloc.MPtr);

CtxImpl->addAssociatedDeviceGlobal(MDeviceGlobalPtr);
return NewAlloc;
}

void DeviceGlobalMapEntry::removeAssociatedResources(
const context_impl *CtxImpl) {
std::lock_guard<std::mutex> Lock{MDeviceToUSMPtrMapMutex};
Expand Down
5 changes: 5 additions & 0 deletions sycl/source/detail/device_global_map_entry.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,11 @@ struct DeviceGlobalMapEntry {
DeviceGlobalUSMMem &
getOrAllocateDeviceGlobalUSM(const std::shared_ptr<queue_impl> &QueueImpl);

// This overload allows the allocation to be initialized without a queue. The
// UR adapter in use must report true for
// UR_DEVICE_INFO_USM_CONTEXT_MEMCPY_SUPPORT_EXP to take advantage of this.
DeviceGlobalUSMMem &getOrAllocateDeviceGlobalUSM(const context &Context);

// Removes resources for device_globals associated with the context.
void removeAssociatedResources(const context_impl *CtxImpl);

Expand Down
22 changes: 15 additions & 7 deletions sycl/source/detail/kernel_bundle_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -891,13 +891,21 @@ class kernel_bundle_impl {
"'device_image_scope' property");
}

// TODO: Add context-only initialization via `urUSMContextMemcpyExp` instead
// of using a throw-away queue.
queue InitQueue{MContext, Dev};
auto &USMMem =
Entry->getOrAllocateDeviceGlobalUSM(getSyclObjImpl(InitQueue));
InitQueue.wait_and_throw();
return USMMem.getPtr();
const auto &DeviceImpl = getSyclObjImpl(Dev);
bool SupportContextMemcpy = false;
DeviceImpl->getAdapter()->call<UrApiKind::urDeviceGetInfo>(
DeviceImpl->getHandleRef(),
UR_DEVICE_INFO_USM_CONTEXT_MEMCPY_SUPPORT_EXP,
sizeof(SupportContextMemcpy), &SupportContextMemcpy, nullptr);
if (SupportContextMemcpy) {
return Entry->getOrAllocateDeviceGlobalUSM(MContext).getPtr();
} else {
queue InitQueue{MContext, Dev};
auto &USMMem =
Entry->getOrAllocateDeviceGlobalUSM(getSyclObjImpl(InitQueue));
InitQueue.wait_and_throw();
return USMMem.getPtr();
}
}

size_t ext_oneapi_get_device_global_size(const std::string &Name) {
Expand Down
10 changes: 10 additions & 0 deletions sycl/source/detail/memory_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -937,6 +937,16 @@ void MemoryManager::copy_usm(const void *SrcMem, QueueImplPtr SrcQueue,
DepEvents.data(), OutEvent);
}

void MemoryManager::context_copy_usm(const void *SrcMem, ContextImplPtr Context,
size_t Len, void *DstMem) {
if (!SrcMem || !DstMem)
throw exception(make_error_code(errc::invalid),
"NULL pointer argument in memory copy operation.");
const AdapterPtr &Adapter = Context->getAdapter();
Adapter->call<UrApiKind::urUSMContextMemcpyExp>(Context->getHandleRef(),
DstMem, SrcMem, Len);
}

void MemoryManager::fill_usm(void *Mem, QueueImplPtr Queue, size_t Length,
const std::vector<unsigned char> &Pattern,
std::vector<ur_event_handle_t> DepEvents,
Expand Down
3 changes: 3 additions & 0 deletions sycl/source/detail/memory_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,9 @@ class MemoryManager {
ur_event_handle_t *OutEvent,
const detail::EventImplPtr &OutEventImpl);

static void context_copy_usm(const void *SrcMem, ContextImplPtr Context,
size_t Len, void *DstMem);

static void fill_usm(void *DstMem, QueueImplPtr Queue, size_t Len,
const std::vector<unsigned char> &Pattern,
std::vector<ur_event_handle_t> DepEvents,
Expand Down
55 changes: 54 additions & 1 deletion unified-runtime/include/ur_api.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions unified-runtime/include/ur_api_funcs.def

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions unified-runtime/include/ur_ddi.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions unified-runtime/include/ur_print.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

52 changes: 52 additions & 0 deletions unified-runtime/include/ur_print.hpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading