SVGF Denoiser implementation

## About
Hey,
I'm working on implementing SVGF for a GI effect I'm working on. So I think it would be very suitable for this project as a denoiser. Since it implements temporal reprojection as well, it would also solve issue https://github.com/gkjohnson/three-gpu-pathtracer/issues/60 through optionally leaving out the denoising pass at the end.
I would like to describe how it works, what is needed to implement it and then how exactly we could implement it in three-gpu-pathtracer.
Related issues: https://github.com/gkjohnson/three-gpu-pathtracer/issues/85 and https://github.com/gkjohnson/three-gpu-pathtracer/issues/60

## How SVGF works
I'm using images/videos from a GI effect for demo purposes here.

### Raw input frame
Suppose you have rendered the diffuse lighting in half resolution for the current frame:
![raw](https://user-images.githubusercontent.com/49001571/201151168-32c1b478-f47e-475a-96a3-41b11a0bd7a7.png)

### Temporal accumulation
The first step is to temporally accumulate like raytracers usually do except that it is also using reprojection so that it doesn't have to discard the accumulated render when there is camera movement. This gives us this result:
![accumulated](https://user-images.githubusercontent.com/49001571/201151596-6d3ebd2e-468f-4e9e-a213-8a2d33753a25.png)

It's comparing both, the normals and the depth of the current and the reprojected pixel. If the difference of either exceeds a set threshold then we have a disocclusion meaning that our current pixel wasn't visible in the last frame.
In the alpha channel of the render target of the TR pass I'm saving the 'age' of a pixel, namely for how many samples a pixel was visible. This age is used to blend pixels individually as pixels that were recently disoccluded need to be blended more aggressively while pixels that have been visible for a long time barely need to be blended in.


For reprojection it's using both, the velocity of a pixel and the ray length of the reflected ray in the current frame (for glossy surfaces only). Reprojection of the 'hit point' using the ray length is needed as reflections have a different parallax than diffuse lighting so we can't correctly reproject them using just a pixel's velocity. However, hit point reprojection only works well if the surface is flat (which can be determined by a pixel's curvature using screen-space derivatives) and if the roughness of the surface is also rather low.

### Denoising
The temporally accumulated texture will be denoised using a smart À-trous blur filter. It will run that blur filter over multiple iterations. Since blurring using a great kernel size can be very expensive, an À-trous blur filter takes every *i*-th pixel into account in its *i*-th iteration allowing it to cover greater kernels while maintaining a decent performance.
Here's an illustration showing how the blur filter selects its neighbors in iteration *i*:
![11](https://user-images.githubusercontent.com/49001571/201156615-06b7cfb4-f595-4a2c-b2ef-86026f4a2273.jpg)

That explains why it's called an 'À-trous' (with holes) blur filter as it always skips *i* neighboring pixels during iteration *i*. You can use varying kernel sizes for it. The kernel size in the illustration would be 9 for example (for every iteration).
The blur filter is edge-stopping so that it doesn't overblur. It's using all sorts of information such as depth similarity, normal similarity, luminance similarity and roughness similarity to weigh neighbors when blurring to preserve details but still get rid of noise. You can control how much the denoiser weighs by certain similarites to trade less noise for less details for example.
One of the most important functionalities is weighing a pixel through its variance, i.e. weigh it on how 'noisy' it is over multiple frames (which is determined in the accumulation pass). The blur filter weighs neighbors based on the center pixel's variance and the neighbors variance. This makes the blur filter deal more aggressively with noisy areas and evaluate changed variances from previous iterations in the current iteration by recalculating variances each iteration.

After 3 blur iterations using a kernel size of 7, we get this result:
![denoised](https://user-images.githubusercontent.com/49001571/201159537-fe92d96f-8a48-401d-9400-e8c82e5f49ba.png)
This gets rid of the remaining noise and helps cover up noisy dissocclusions tremendously by denoising them more aggressively.

### Video
Here's how everything looks in motion, showing the denoised, temporally accumulated output first, then just the temporally accumulated output and then the raw input frame in the end:

[svgf_2.webm](https://user-images.githubusercontent.com/49001571/201173545-8e38e3b5-2dea-464d-a99d-cd152d42b50f.webm)

## Requirements
SVGF needs the following inputs:
- rendered lighting only (without direct diffuse textures applied, indirect diffuse lighting is still allowed)
- depth
- normals

SVGF will then have the following output:
- denoised lighting

You can then combine that denoised lighting with the direct diffuse textures of the materials to get the final raytraced output.
Reasons why direct diffuse lighting isn't included:
- for TR, we should only use info that can't be computed in the current frame, as direct diffuse textures can be rendered each frame, we shouldn't reproject them at all
- the denoising pass will overblur the direct diffuse textures resulting in unsharpness and loss of details

Including direct diffuse textures when temporally accumulating was also the main reason for the smearing and temporal lag in my recent PR (https://github.com/gkjohnson/three-gpu-pathtracer/pull/241). When you want to reproject entire frames and not just lighting, you would need to use more constraining methods than just depth/normal comparison such as 'neighborhood clamping' which is the usual method to eliminate smearing for TRAA. We can't use neighborhood clamping here as it only works properly when we have the full scene information computed each frame (which is the case in TRAA). For noisy inputs like we have here it'll result in false positives, discarding correct accumulated pixels due to neighborhood clamping when the neighboring pixels have too much variance.

So I'd like to open a PR soon but have a few question regarding the implementation first:
- How exactly would you combine the direct diffuse textures with the denoised lighting?
- Would it be possible to store the reflected ray length in the alpha channel of your raytraced buffer? If not, then we could disable hit point reprojection and just use reprojection through a pixel's velocity.

### References
- [Adventures in Hybrid Rendering](https://diharaw.github.io/post/adventures_in_hybrid_rendering/): great explanation of SVGF and temporal reprojection
- [Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination](https://cg.ivd.kit.edu/publications/2017/svgf/svgf_preprint.pdf)
- [Edge-Avoiding À-Trous Wavelet Transform for fast Global Illumination Filtering](https://jo.dreggn.org/home/2010_atrous.pdf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SVGF Denoiser implementation #292

About

How SVGF works

Raw input frame

Temporal accumulation

Denoising

Video

Requirements

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

SVGF Denoiser implementation #292

Description

About

How SVGF works

Raw input frame

Temporal accumulation

Denoising

Video

Requirements

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions