Suppose I run kernels in multiple queues, using memory pools to allocate output buffers for both. Currently, nothing is preventing a buffer from being returned to the pool before all kernels are done reading/writing it. Once it's in the pool, kernels from the other queues might pick up that buffer and overwrite it, with no synchronization to ensure that kernels from the first queue are done with it.
cc @VincentWells