Should I modify the Quartznet 5x15 architecture for longer sentences? #2109
Replies: 2 comments
-
Yes, increasing depth will increase your effective receptive field. |
Beta Was this translation helpful? Give feedback.
-
So 5x30 or 10x15? Can you say intuitively what the 5 and 15 do in 5x15? If it was 5x30 could I start with weights for 5x15 and randomize the extra 5x15 to begin with? NOTE: I think what I'm really going to do is just manually inspect and split the very long training set annotated phrases into shorter ones, because I can chunk the test set any size I want. Also to improve performance I will manually inspect and discard bad training data, for example where a 19 word transcription is assocated to a single short "Hunh" sound. In practice I've observed that 6 seconds is the most anybody speaks without pausing at all. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have training data where some clips are up to 33 seconds long. I've tried splitting them on silence and allocating the text proportional to size but that is very ad hoc. So I prefer to keep them.
I've noticed during training that the QuartzNet 15x5 tends to more errors at the end of a phrase than at the beginning. I feel like it may be losing long-term dependencies or that those are harder to learn.
On the other hand, what I hear you saying about the parameter space is that it is independent of the
max_duration
parameter of the config. So same number of parameters to parse 1 second phrases as 33 second phrases.Is that really true? Or can I get better long-term dependency by adding layers? If so, can you recommend a config which will work better for training against longer phrases?
If that would improve matters, I assume that means I have to train from scratch? I.e. if I add layers, I can't use the pretrained weights for the base Q15x5? Or if I can, can you give me a snippet of code which would show me how to add the extra layers to the pretrained model?
Beta Was this translation helpful? Give feedback.
All reactions