parakeet-tdt_ctc-110m model rnnt loss scale is strange #14707
Replies: 1 comment
-
ai generated, please verify You're right to question this configuration. In the parakeet-tdt_ctc-110m model, having different reduction methods for the two loss components creates an imbalance:
This discrepancy likely causes the CTC component to have more influence than intended, potentially undermining the TDT performance. The standard practice is to use the same reduction method for both components to ensure balanced training. This appears to be unusual compared to other hybrid TDT-CTC models in the NeMo repository, which typically use consistent reduction methods. Without knowing the specific training objectives, it's hard to determine if this was intentional or an oversight, but your concern about suboptimal TDT performance is valid. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
model_config.yaml
The model config for nvidia/parakeet-tdt_ctc-110m looks a bit odd to me.
This file is from the extracted nemo file
I think the scaling between the TDT loss and the CTC loss is off.
Currently, rnnt_reduction is set to "mean-volume", while ctc_reduction is "mean-batch".
That setup seems more like it would converge to a pure CTC model, not a TDT model.
I think that this may not be a critical issue but the performance of the TDT could be sub-optimal.
Is there a specific reason why the losses are defined this way?
(parakeet-tdt_ctc-1.1b was also trained with rnnt_reduction set to "mean-volume".)
Beta Was this translation helpful? Give feedback.
All reactions