Skip to content

fix: wrong dtype and device in aten.full_like decomposition #3535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025

Conversation

junstar92
Copy link
Contributor

@junstar92 junstar92 commented May 28, 2025

Description

This PR addresses a bug in the Torch-TensorRT decomposition of torch.ops.aten.full_like.

In the current implementation, the decomposition incorrectly overrides the dtype and device arguments, ignoring explicitly set dtype values and assigning all tensors to the default_device() (typically cuda:0), regardless of the inputs' actual device.

Specifically, the issue occurs in the following decomposition function:

@register_torch_trt_decomposition(
torch.ops.aten.full_like, registry=TORCH_TRT_DECOMPOSITIONS
) # type: ignore
def full_like_decomposition(*args, **kwargs) -> torch.Tensor:
input = args[0]
shape = args[0].shape
fill_value = args[1]
kwargs["dtype"] = input.dtype
kwargs["device"] = to_torch_device(default_device())
return torch.full(shape, fill_value, dtype=kwargs["dtype"], device=kwargs["device"])

This implementation causes two main issues:

  1. Incorrect dtype propagation: Even when torch.full_like(..., dtype=torch.bool) is used in the model, the decomposition overwrites the dtype with input.dtype (e.g., float16), resulting in an incorrect output type.
  2. Device mismatch: When exporting and running models on devices other than cuda:1 (e.g., cuda:1), the decomposition forces outputs to be on cuda:0, causing runtime errors or silent bugs due to device mismatch.

To demonstrate the issue, the following test cases are included in this PR:

import torch
from torch.export._trace import _export
from torch_tensorrt.dynamo.lowering import get_decompositions


class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return torch.ones_like(x, dtype=torch.bool)


def test1() -> tuple[bool, str]:
    model = MyModel()
    x = torch.randn(1, 10, dtype=torch.float16)
    y = model(x)
    return y.dtype == torch.bool, f"expected dtype {torch.bool}, and got {y.dtype}"


def test2() -> tuple[bool, str]:
    model = MyModel()
    x = torch.randn(1, 10, dtype=torch.float16)
    ep = _export(model, (x,))
    ep = ep.run_decompositions(get_decompositions(False))
    gm = ep.module()
    y = gm(x)
    return y.dtype == torch.bool, f"expected dtype {torch.bool}, and got {y.dtype}"

def test3() -> tuple[bool, str]:
    device = torch.device("cuda", index=1)
    model = MyModel().to(device)
    x = torch.randn(1, 10, dtype=torch.float16).to(device)
    ep = _export(model, (x,))
    ep = ep.run_decompositions(get_decompositions(False))
    gm = ep.module()
    y = gm(x)
    return y.device == device, f"expected device {device}, and got {y.device}"
    

for test in (test1, test2, test3):
    success, msg = test()
    print(f"{test.__name__}: {'Success' if success else 'Failed'} - {msg}")

Results:

test1: Success - expected dtype torch.bool, and got torch.bool
test2: Failed - expected dtype torch.bool, and got torch.float16
test3: Failed - expected device cuda:1, and got cuda:0
  • test1: Verifies that torch.ones_like returns a tensor with the correct dtype.
  • test2: Shows that the exported model via torch.export(...).run_decompositions(...) fails to preserve dtype.
  • test3: Demonstrates the incrroect device assignment after decomposition when using non-default CUDA devices.

This PR fixes the decomposition logic to correctly respect the explicitly passed dtype and device values, or fall back to those inferred from the input tensor only if not explicitly provided.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@facebook-github-bot
Copy link
Contributor

Hi @junstar92!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@github-actions github-actions bot added component: lowering Issues re: The lowering / preprocessing passes component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 28, 2025
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@narendasan
Copy link
Collaborator

@junstar92 thanks for the PR, we will review it!

Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix

@peri044 peri044 merged commit ee32da0 into pytorch:main May 28, 2025
84 checks passed
@junstar92
Copy link
Contributor Author

@peri044 Thanks for quick review.

But, I missed that device needs to be treated like dtype because full_like also accepts device as an argument.

kwargs["device"] = kwargs.get("device", None) or input.device

Without handling it, a device different from the input's may be used, causing mismatches.
Should I open a new one to address the remaining issue?

@apbose
Copy link
Collaborator

apbose commented May 29, 2025

Hi @junstar92 you could open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: lowering Issues re: The lowering / preprocessing passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants