Skip to content

what's the cpp version pywhispercpp bind to? #73

@zillionare

Description

@zillionare

Forgive me if the question is dumb. I'm totally unfamiliar with pybind. But when I compare pywhispercpp and whispercpp cli, I found the difference.

The comparison is under same conditions:

  1. same prompt: "好,我们开始上课。请输出简体中文,以下是专有名词“ (means: ok, let's start the lesson. Please output zh-cn, not zh-tw"
  2. this is how cpp is invoked: ./main -l zh -t 8 -m models/ggml-large-v2.bin -osrt --prompt '好,我们开始上课.请输出简体中文' -of /tmp/whisper.cpp.srt /tmp/output005.wav
  3. this is how I construct pywhispercpp model:
    model = Model(model, 
                  n_threads=8,
                  n_max_text_ctx=448,
                  max_len=30,
                  split_on_word=True,
                  initial_prompt=prompt,
                  language="zh"
                  )
    segments = model.transcribe(input_audio, new_segment_callback=new_segment_callback)

however, pywhispercpp yields traditional Chinese characters, and its output is not as accurate as whispercpp.

So, is there any difference between pywhispercpp and whispercpp? The version of whispercpp I used is 1.7.1, released four days ago, so I wonder pywhispercpp has not bound to latest one yet?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions