Skip to content

Add a GPU BVH (bvh2) builder#853

Draft
jure wants to merge 4 commits intogkjohnson:masterfrom
jure:gpu_bvh_builder
Draft

Add a GPU BVH (bvh2) builder#853
jure wants to merge 4 commits intogkjohnson:masterfrom
jure:gpu_bvh_builder

Conversation

@jure
Copy link
Copy Markdown

@jure jure commented Feb 21, 2026

This PR adds a H-PLOC-like GPU BVH2 builder for WebGPU and wires it into new/updated WebGPU examples. It's a draft as there's a lot going on still and discussion is needed for sure. :)

Adds:

  • GPUMeshBVH with GPU build/refit paths.
  • build from CPU geometry and directly from GPU buffers.
  • the OneSweep sorter
  • BVH2 traversal support

Performance:

The timings in the videos are incorrect, as the build/refit are both async, so those displayed times are submit only. But for a ~900k triangle asset, the BVH builds in ~13 ms, and is refit in ~5 ms. Optionally timing queries can be enabled in the skinned mesh example, but they add quite a bit of overhead. Generally I think there's room for improvement here, but it does get quite tricky quite fast with WebGPU's limitations. I do have a BVH validator example locally, which is used to benchmark the performance and measure the quality of the BVH, but I not sure if it's a great fit for the PR scope.

Documentation: ✅

Examples:

  • example/webgpu_skinnedMesh.html/js (Full GPU skinning + BVH rebuild/refit flow)
Screen.Recording.2026-02-21.at.01.07.51.mov
  • example/webgpu_explodingMesh.html/js (Worst case for refit, makes the case for fast full rebuilds)
Screen.Recording.2026-02-21.at.01.04.00.mov
  • example/webgpu_gpuPathTracingSimple_gpuBuild.html/js (same as the existing one, but just with GPU build)
Screen.Recording.2026-02-21.at.01.17.24.mov
  • TLAS/BLAS example? I have one locally, but maybe too much for repo?
Screen.Recording.2026-02-21.at.01.15.05.mov
  • shared heavy asset (about a 900k tris asset): example/inferno-beast-from-space-from-jurafjvs-cc0-2.glb


const t = THREE.MathUtils.clamp( kelvin, 1000.0, 40000.0 ) / 100.0;
let r = 255.0;
let g = 255.0;

Check warning

Code scanning / CodeQL

Useless assignment to local variable Warning

The initial value of g is unused, since it is always overwritten.
@gkjohnson
Copy link
Copy Markdown
Owner

gkjohnson commented Feb 27, 2026

Hey Jure! Thanks for adding this - I'm excited to see this added to the project. There's a lot here so it's going to take me some time to wrap my head around. I'm less familiar with compute-based algorithms for generating BVHs so it would be helpful if you could provide a brief overview of the algorithm and the general flow of functions here. Also -

  • What files should I be starting with to understand the core of the algorithm? It looks like most of the files changed here are related to examples, is that right?
  • It looks like the memory layout of the nodes is different in service of the needs of the algorithm. Can you give a brief explanation of the memory layouts and why they're different?

We've also recently started some work on the thre-gpu-pathtracer again to convert it to support TSL and WebGPU so some of my thinking around these things is starting to solidify. In the long term I think it makes sense to support interoperation of the CPU and GPU BVH variants to support things like upload from CPU and readback from GPU in support of different use cases. We don't have to support that now but maybe something to keep in mind.

Relating to the path tracer, there's some related work happening over in gkjohnson/three-gpu-pathtracer#713 to support the CPU -> GPU compute use case that might be good to be aware of. Specifically an ObjectBVH & SkinnedMeshBVH have recently been added to support a CPU-side TLAS structure and a "BVHComputeData" class has been added in the pathtracer branch to support packing geometry + BVH data into storage buffers and allows for constructing custom query functions, which will eventually be moved back to this project (There are some TSL utility functions that have been written to enable the type of generic node construction with wgsl literals needed to support the custom query functions, as well). These storage buffers encode the BVH using the same memory layout used in the CPU BVH buffers so once both this PR and that one have settled a bit it will probably make sense to meet somewhere in the middle and move their in-memory representations towards each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants