Implement a PipelineBarrierBatchBuilder for batching calls to vkCmdPipelineBarrier

### Is your feature request related to a problem?

In Vulkan, batching is very important: Instead of calling a Vulkan function several times repeatedly, we can sometimes batch all arguments into one function call, if the Vulkan function takes a pointer to an array of arguments. This is the case for [vkCmdPipelineBarrier](https://registry.khronos.org/vulkan/specs/latest/man/html/vkCmdPipelineBarrier.html).

```cpp
void vkCmdPipelineBarrier(
    VkCommandBuffer                             commandBuffer,
    VkPipelineStageFlags                        srcStageMask,
    VkPipelineStageFlags                        dstStageMask,
    VkDependencyFlags                           dependencyFlags,
    uint32_t                                    memoryBarrierCount,
    const VkMemoryBarrier*                      pMemoryBarriers,
    uint32_t                                    bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier*                pBufferMemoryBarriers,
    uint32_t                                    imageMemoryBarrierCount,
    const VkImageMemoryBarrier*                 pImageMemoryBarriers);
```

Barrier placement is very hard to do right in Vulkan. In general, you want to keep the number of barriers as small as possible, but you also need a minimum of barriers to ensure correctness. You also must make sure to be very tight with parameters of the barrier for optimal performance. If you can, you should batch barriers into one call of `vkCmdPipelineBarrier`, which is the core idea behind this issue.

### Description

We could implement a builder pattern which abstracts collecting barriers and placing the barrier. The build method is simply one call to `vkCmdPipelineBarrier` with all barriers batched:

```cpp
vkCmdPipelineBarrier(
    cmd,
    srcStageMask,
    dstStageMask,
    0,
    static_cast<uint32_t>(memoryBarriers.size()), memoryBarriers.data(),
    static_cast<uint32_t>(bufferBarriers.size()), bufferBarriers.data(),
    static_cast<uint32_t>(imageBarriers.size()), imageBarriers.data()
);
```

There are a few things we need to look out for here:

* We would need to reorganize the rendergraph code for updates of buffers or barriers so that the barriers can be batched by type effectively. This will be discussed in another issue.
* We will mainly need the buffer memory barriers and image memory barriers, as raw memory barriers should be avoided (depending on the exact use case).
* Without `VK_KHR_synchronization2`, the `srcStageMask` and `dstStageMask` applies to all barriers. This is not optimal because there could be cases where individual barriers have different stage masks, which means we could have to call `vkCmdPipelineBarrier` repeatedly for every combination of access masks, even in cases where we could batch it more tightly. With sync2, which is part of Vulkan 1.3 core, we can have a different approach:
```cpp
// With VK_KHR_synchronization2, the access masks are part of the barrier itself
typedef struct VkBufferMemoryBarrier2 {
    VkStructureType            sType;            
    const void*                pNext;
    VkPipelineStageFlags2      srcStageMask;
    VkAccessFlags2             srcAccessMask;
    VkPipelineStageFlags2      dstStageMask;
    VkAccessFlags2             dstAccessMask;
    uint32_t                   srcQueueFamilyIndex;  
    uint32_t                   dstQueueFamilyIndex;
    VkBuffer                   buffer;
    VkDeviceSize               offset;
    VkDeviceSize               size;
} VkBufferMemoryBarrier2;

void build(VkCommandBuffer cmd) {
    if (memoryBarriers.empty() && bufferBarriers.empty() && imageBarriers.empty())
        return;

    VkDependencyInfo depInfo{};
    depInfo.sType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO;
    depInfo.memoryBarrierCount = static_cast<uint32_t>(memoryBarriers.size());
    depInfo.pMemoryBarriers = memoryBarriers.data();
    depInfo.bufferMemoryBarrierCount = static_cast<uint32_t>(bufferBarriers.size());
    depInfo.pBufferMemoryBarriers = bufferBarriers.data();
    depInfo.imageMemoryBarrierCount = static_cast<uint32_t>(imageBarriers.size());
    depInfo.pImageMemoryBarriers = imageBarriers.data();

    // In summary, this allows for more fine-grained synchronization
    vkCmdPipelineBarrier2(cmd, &depInfo);

    // Clear after use
    memoryBarriers.clear();
    bufferBarriers.clear();
    imageBarriers.clear();
}
```
* Initially, I thought we could record all pipeline barriers into one batched call to `vkCmdPipelineBarrier` and maybe cache this as a secondary command buffer, which could be reused. The problem here is that this is almost impossible because the buffer memory barriers require the size of the buffer to be specified. This is not easy to expose as a parameter in a recorded command buffer, because after recording, they are immutable.
* There is [vkCmdUpdateBuffer](https://registry.khronos.org/vulkan/specs/latest/man/html/vkCmdUpdateBuffer.html), but this is limited in size
> The additional cost of this functionality compared to [buffer to buffer copies](https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html#copies-buffers) means it should only be used for very small amounts of data, and is why it is limited to at most 65536 bytes

### Alternatives

If we don't use a `PipelineBarrerBatchBuilder`, and if we don't batch any pipeline barriers at all, we might have serious performance implications at some point. This might not be important for a small renderer, but since we want to have a scalable engine, this will be important for the future.

### Affected Code

The rendergraph and wrapper code for command buffers

### Operating System

All

### Additional Context

Initially, I thought about introducing this in [rendergraph2](https://github.com/inexorgame/vulkan-renderer/pull/533), but this would be too much for this pull request, which is already very big.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement a PipelineBarrierBatchBuilder for batching calls to vkCmdPipelineBarrier #568

Is your feature request related to a problem?

Description

Alternatives

Affected Code

Operating System

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement a PipelineBarrierBatchBuilder for batching calls to vkCmdPipelineBarrier #568

Description

Is your feature request related to a problem?

Description

Alternatives

Affected Code

Operating System

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions