-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
A clear and concise description of what the bug is.
使用deepspeed训练examples/aishell/whisper,会报错:
python3.10/site-packages/deepspeed/runtime/torch_autocast.py", line 97, in validate_nested_autocast
raise AssertionError(
AssertionError: torch.autocast is enabled outside DeepSpeed, but not in the DeepSpeed config. Please enable torch.autocast through the DeepSpeed config to ensure the correct communication dtype is used.
修改batch_forward函数,去掉with autocast, 可以运行,但是出现数据类型问题:
ch/nn/modules/conv.py", line 370, in _conv_forward
return F.conv1d(
RuntimeError: Input type (float) and bias type (c10::BFloat16) should be the same
手动改了输入类型bf16,输出loss会有新的问题
RuntimeError: "ctc_loss_cuda" not implemented for 'BFloat16'
torch version: 2.6.0
deepspeed: 0.17.5