[discussion] about the performance of the best-rq model

i've done some experiments on the best-rq model, including ssl pretraining and supervised finetuning both on wenetspeech dataset. luckily, the pretraining is stable and with # of codebook equals to 1, the training acc can reach around 0.3. the following image is the training curve (the yellow line is wenetspeech only, and the blue one is wenetspeech + some industrial data)

<img width="2488" height="1265" alt="Image" src="https://github.com/user-attachments/assets/999716c1-a3af-42d2-8b0d-0350fadb46dc" />
during the supervised finetuning step, i basically frozen all encoder parameters and finetuned the ctc projection layer on wenetspeech dataset to do something like "probing test", but the result was a mess. 

<img width="2076" height="1042" alt="Image" src="https://github.com/user-attachments/assets/bfae450e-2b51-4d97-95d4-2f190d5a1cae" />
i noticed that there are a few discussions about the ssl models in this community. opening this discussion issue see if anyone is meeting similar problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[discussion] about the performance of the best-rq model #2787

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[discussion] about the performance of the best-rq model #2787

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions