Try to match `torch.ops.fbgemm.quantize_fp8_per_row` and `_choose_qparams_affine_float8` and `_quantize_affine_float8`