Skip to content

Commit d59818c

Browse files
committed
Feat: add basic static shared memory support
1 parent db6f72a commit d59818c

File tree

3 files changed

+53
-1
lines changed

3 files changed

+53
-1
lines changed

crates/cuda_std/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ and advanced users.
1212
- Added `is_in_address_space`
1313
- Added `convert_generic_to_specific_address_space`
1414
- Added `convert_specific_address_space_to_generic`
15+
- Added basic static shared memory support with `cuda_std::shared_array`.

crates/cuda_std/src/lib.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
//! structures as well as common imports such as [`thread`].
2323
2424
#![cfg_attr(
25-
any(target_arch = "nvptx", target_arch = "nvptx64"),
25+
target_os = "cuda",
2626
no_std,
2727
feature(register_attr, alloc_error_handler, asm, link_llvm_intrinsics),
2828
register_attr(nvvm_internal)
@@ -39,6 +39,7 @@ pub mod misc;
3939
// WIP
4040
// pub mod rt;
4141
pub mod ptr;
42+
pub mod shared;
4243
pub mod thread;
4344
pub mod warp;
4445

crates/cuda_std/src/shared.rs

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
//! Shared memory handling. Currently only macros.
2+
3+
/// Statically allocates a buffer large enough for `len` elements of `array_type`, yielding
4+
/// a `*mut array_type` that points to uninitialized shared memory. `len` must be a constant expression.
5+
///
6+
/// Note that this allocates the memory __statically__, it expands to a static in the `shared` address space.
7+
/// Therefore, calling this macro multiple times in a loop will always yield the same data. However, separate
8+
/// invocations of the macro will yield different buffers.
9+
///
10+
/// The data is uninitialized by default, therefore, you must be careful to not read the data before it is written to.
11+
/// The semantics of what "uninitialized" actually means on the GPU (i.e. if it yields unknown data or if it is UB to read it whatsoever)
12+
/// are not well known, so even if the type is valid for any backing memory, make sure to not read uninitialized data.
13+
///
14+
/// # Safety
15+
///
16+
/// Shared memory usage is fundamentally extremely unsafe and impossible to statically prove, therefore
17+
/// the burden of correctness is on the user. Some of the things you must ensure in your usage of
18+
/// shared memory are:
19+
/// - Shared memory is only shared across __thread blocks__, not the entire device, therefore it is
20+
/// unsound to try and rely on sharing data across more than one block.
21+
/// - You must write to the shared buffer before reading from it as the data is uninitialized by default.
22+
/// - [`thread::sync_threads`](crate::thread::sync_threads) must be called before relying on the results of other
23+
/// threads, this ensures every thread has reached that point before going on. For example, reading another thread's
24+
/// data after writing to the buffer.
25+
/// - No access may be out of bounds, this usually means making sure the amount of threads and their dimensions are correct.
26+
///
27+
/// It is suggested to run your executable in `cuda-memcheck` to make sure usages of shared memory are right.
28+
///
29+
/// # Examples
30+
///
31+
/// ```no_run
32+
/// #[kernel]
33+
/// pub unsafe fn reverse_array(d: *mut i32, n: usize) {
34+
/// let s = shared_array![i32; 64];
35+
/// let t = thread::thread_idx_x() as usize;
36+
/// let tr = n - t - 1;
37+
/// *s.add(t) = *d.add(t);
38+
/// thread::sync_threads();
39+
/// *d.add(t) = *s.add(tr);
40+
/// }
41+
/// ```
42+
#[macro_export]
43+
macro_rules! shared_array {
44+
($array_type:ty; $len:expr) => {{
45+
// the initializer is discarded when declaring shared globals, so it is unimportant.
46+
#[$crate::address_space(shared)]
47+
static mut SHARED: MaybeUninit<[$array_type; $len]> = MaybeUninit::uninit();
48+
SHARED.as_mut_ptr() as *mut $array_type
49+
}};
50+
}

0 commit comments

Comments
 (0)