-
Notifications
You must be signed in to change notification settings - Fork 165
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Hi, thanks for releasing this great project!
I noticed in the code that when speed_perturb is enabled, the number of classes is multiplied by 3 (line 130, train.py):
if configs['data_type'] != 'feat' and configs['dataset_args'][
'speed_perturb']:
# diff speed is regarded as diff spk
configs['projection_args']['num_class'] *= 3
if configs.get('do_lm', False):
logger.info(
'No speed perturb while doing large margin fine-tuning')
configs['dataset_args']['speed_perturb'] = False
This seems to treat each speed-perturbed version of an utterance as if it were from a different speaker.
I would expect speed perturbation to keep the same speaker label (since the identity doesn’t change, only the speaking rate).
Could you clarify the motivation or provide references for this design choice?
Is there a specific paper or benchmark showing that treating speed-perturbed audio as different speakers improves performance?
Wouldn’t this risk confusing the model by artificially inflating the number of classes?
Thanks a lot!
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers