Skip to content

Conversation

@SYNX7007
Copy link

Before submitting

What does this PR do?

Fixes #5642.

1. Fix HuBERT Masking Logic

The apply_mask method in fairseq/models/hubert/hubert.py was calling compute_mask_indices with the default require_same_masks=True. This caused variable-length sequences in the same batch to be masked incorrectly (constrained by the shortest sequence). I set require_same_masks=False to resolve this.

2. Python 3.11 Compatibility

Fixed a ValueError in fairseq/dataclass/configs.py caused by mutable default arguments in dataclasses (common: CommonConfig = CommonConfig()), which is not allowed in Python 3.11+. Switched to field(default_factory=...).

Verification

Verified with a reproduction script simulating a batch with lengths [100, 50].

  • Before Fix: Length 100 sample was under-masked (~21% instead of 65%).
  • After Fix: Length 100 sample is correctly masked (~47-65%).

PR review

Ready for review!

@meta-cla meta-cla bot added the CLA Signed label Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apply_mask in HuBERT does not apply expected percentage of masks if the input shapes differ. which is true in the case of fintuning.

1 participant