You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Phoneme Segmentation Using Self-Supervised Speech Models
2
+
3
+
## Usage
4
+
5
+
### Obtain Pre-trained Model Checkpoints
6
+
wav2vec2.0 and HuBERT checkpoints are available via fairseq at the following links. Download these models and place in a new folder titled `checkpoints`.
### Obtain and Process TIMIT and/or Buckeye Speech Corpus
15
+
16
+
Once the data has been obtained it must be stored in disk an a fashion that can be read by the provided dataloader, the core of which is borrowed from Kreuk Et al. (https://github.com/felixkreuk/UnsupSeg). See the Data Structure section of this repo for specifics, or simply use the provided `utils/make_timit.py` and `utils/make_buckeye.py` to split and organize the data exactly how we did it. Note: both of these scripts we also credit to Kreuk Et al., save a few minor changes.
17
+
18
+
You can run `make_timit.py` and `make_buckeye.py` as follows:
Note, here we do not provide the infrastructure to train these models using the pseudo-labels derived from a trained unsupervised model; however, the core implementation can be easily extended to train with alternate label supervision so long as the dataloader's interface remains unchanges. For those interested in training such a model, we would direct you to Kreuk Et al., where a pretrained unsupervised model can be used to generate pseudo-labels for TIMIT.
25
+
26
+
### Update Configuration YAML
27
+
28
+
The following fields will need to be updated to reflect local paths on your machine:
29
+
30
+
- timit_path
31
+
- buckeye_path
32
+
- base_ckpt_path
33
+
34
+
You may also want to experiment with the `num_workers` attribute depending on your hardware.
35
+
36
+
### Training and Testing
37
+
38
+
To freeze the pre-trained model weights and train only a classifier readout model on TIMIT with a wav2vec2.0 backbone run the following
`data=timit` can easily be swapped for `data=buckeye` just as `base_ckpt_path=/path/to/wav2vec2.0_ckpt` can be swapped with `base_ckpt_path=/path/to/hubert_ckpt`.
43
+
44
+
To finetune the whole pre-trained model and simply project final features with a linear readout run the you should set `lr=0.0001` and `mode=finetune`. Otherwise, the same swapping for TIMIT/Buckeye and wav2vec2.0/HuBERT applies.
45
+
46
+
Invoking `run.py` will train a model from scratch for 50 epochs while printing training stats every 10 batches and running model validation every 50 batches. Print preferences can be changed in the config with attributes `print_interval` and `val_interval`. `epochs` can also be modified if desired.
47
+
48
+
During training models are saved to disk if they so-far demonstrate the best R-Value on the validation set. After training is complete, the best model is loaded from disk and tested with the testing set. Performance metrics in the harsh and lenient evaluation scheme are logged to standard out.
49
+
50
+
Lastly, every invocation of `run.py` will create an output folder under `outputs/datestamp/{exp_name}_timestamp`, which is where model checkpoints are saved along with the whole runtime config and a run.log. Everything logged to standard output during training will also be logged to the run.log file.
51
+
52
+
### Additional
53
+
54
+
This codebase assumes CUDA availability.
55
+
56
+
The config `seed` attribute can be changed to control random shuffling and initialization.
57
+
58
+
`train_percent` indicates the fraction of the training set to use. Some may be interested in observing model / training data efficiency by sweeping over this attribute. Sweeps can be easily accomodated using hydra's multi-run command line option. For more see the hydra docs.
0 commit comments