Skip to content

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

@JackTemaki

Description

@JackTemaki

Bug Description

I tried building ASR systems on a very common standard task (LibriSpeech-100h) using the torchaudio ctc decoder. This decoder uses the flashlight/text library as decoding backend. While my subword (BPE) based setups worked fine, the phoneme based did not.

The standard librispeech lexicon includes e.g. those 7 words, that in ARPA notation all get the same phone sequence:

BAE B AY#        
BAI B AY#           
BI B AY#                                                                                                                                                                                                                                                                                                                                                                            
BUY B AY#
BY B AY#
BY' B AY#
BYE B AY#

Which resulted e.g. in the word BY not being recognized anymore.
In the log I get the message:
[Trie] Trie label number reached limit: 6
which correctly tells if this limit is applied, but I would like to raise that this limit is very low, and not configurable without re-compiling. Also the message did not look to me like a serious issue at first.

Reproduction Steps

  • Use torchaudio ctc_decoder with a phoneme based lexicon containing homophones with more than 6 variations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions