-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Hi,
I’m seeing reproducible numerical discrepancies when the model runs on the same physical GPU (RTX A6000) but is addressed with two different logical indices.
Minimal repro, the out_A and out_B are different (scenario B is the correct behavior):
import os
from groundingdino.util.inference import load_model, predict
from PIL import Image
from torchvision.transforms import ToTensor
CFG = "GroundingDINO_SwinT_OGC.py"
CKPT = "groundingdino_swint_ogc.pth"
IMG = Image.open("test.jpg")
IMG_TENSOR = ToTensor()(IMG)
TEXT = "green chair."
# -------- scenario A: direct index ---------------------------------
DEVICE = "cuda:5" # physical GPU‑5
model = load_model(CFG, CKPT).to(DEVICE)
out_A = predict(model, IMG_TENSOR, TEXT, 0.4, 0.35, device=DEVICE)
# -------- scenario B: masked index ---------------------------------
os.environ["CUDA_VISIBLE_DEVICES"] = "5" # mask have to be BEFORE importing torch
import torch
DEVICE = torch.device("cuda:0") # logical 0 → physical GPU‑5
model = load_model(CFG, CKPT).to(DEVICE)
out_B = predict(model, IMG_TENSOR, TEXT, 0.4, 0.35, device=DEVICE)
Problem Location
I then tracked down the code and discovered the problem is in line 354 of groundingdino/models/GroundingDINO/ms_deform_attn.py
output = self.output_proj(output)
output going into the Linear is identical between the two scenarios; the weights/bias are also identical (state‑dict diff confirms). Only the result differs.
Since self.output_proj is a linear layer and I cannot go any deeper. Could you take a look at the code and see what is going on in there?
Thank you very much for your help and assistance!
Environment:
GPU: A6000
Driver: 550.54.15
CUDA: 12.4
PyTorch: 2.6.0+cu124