Skip to content

[tensor-descriptor]: Improve translation for make_tensor_ptr operations in control flow #4132

@etiotto

Description

@etiotto

The current pass that transforms tensor descriptor API into block pointers has some limitation. In particular it fails to translate tensor descriptor loads that use descriptors which may be created in different control flow paths. Here is an example:

    a_desc = tl.make_tensor_descriptor(
        a_ptr,
        shape=[M, N],
        strides=[N, 1],
        block_shape=[MBLOCK, NBLOCK],
    )

    for i in range(0, N, NBLOCK):
        assert isinstance(a_desc, tl.tensor_descriptor)
        if i % (3 * NBLOCK) == 0:
            a_desc = tl.make_tensor_descriptor(
                a_ptr,
                shape=[M, N],
                strides=[N, 1],
                block_shape=[MBLOCK, NBLOCK],
            )
            assert isinstance(a_desc, tl.tensor_descriptor)
        assert isinstance(a_desc, tl.tensor_descriptor)
        a = a_desc.load([moffset, i])
        a_desc.store([moffset, i], a + 10)

    n = 0

In this example the a_desc used by the load gets changed (at runtime) depending on whether the branch in the loop is taken or not. Currently this code pattern cannot be handled by the existing pass because it attempts to back propagate the offsets from the load operation to the make_tensor_descriptor operation which is not unique.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions