Skip to content

[Bug] KeyError: 'default_speaker' #4258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
apachexc opened this issue May 6, 2025 · 4 comments
Open

[Bug] KeyError: 'default_speaker' #4258

apachexc opened this issue May 6, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@apachexc
Copy link

apachexc commented May 6, 2025

Describe the bug

@app.route("/api/tts", methods=["GET", "POST"])
def tts():
with lock:
text = request.headers.get("text") or request.values.get("text", "")
speaker_idx = request.headers.get("speaker-id") or request.values.get("speaker_id", "")
language_idx = request.headers.get("language-id") or request.values.get("language_id", "")
style_wav = request.headers.get("style-wav") or request.values.get("style_wav", "")
style_wav = style_wav_uri_to_dict(style_wav)

    print(f" > Model input: {text}")
    print(f" > Speaker Idx: {speaker_idx}")
    print(f" > Language Idx: {language_idx}")
    wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
    out = io.BytesIO()
    synthesizer.save_wav(wavs, out)
return send_file(out, mimetype="audio/wav")

or

    device = "cpu"
    tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)
    tts.tts_to_file(text=text,
            file_path="output.wav",
            speaker_wav=style_wav,
            language="en")

ERROR:xtts_server_xc:Exception on /api/tts [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/root/TTS/server/xtts_server_xc.py", line 209, in tts
wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
File "/root/TTS/utils/synthesizer.py", line 386, in tts
outputs = self.tts_model.synthesize(
File "/root/TTS/tts/models/xtts.py", line 411, in synthesize
gpt_cond_latent, speaker_embedding = self.speaker_manager.speakers[speaker_id].values()
KeyError: 'default_speaker'

Both with and without the speaker parameter will result in an error, and the problem lies with the speaker.
May I ask, what is the value of the speaker parameter?

To Reproduce

Both with and without the speaker parameter will result in an error, and the problem lies with the speaker.
May I ask, what is the value of the speaker parameter?

Expected behavior

No response

Logs

Environment

docker

Additional context

No response

@apachexc apachexc added the bug Something isn't working label May 6, 2025
@eginhard
Copy link
Contributor

eginhard commented May 6, 2025

The server from this repo doesn't support the XTTS model. You can use our fork (available via pip install coqui-tts) and corresponding docker images instead.

@apachexc
Copy link
Author

@eginhard

Hello, I have already run the idiap branch using Docker. I saw in the instructions for the idiap branch that you can clone voice using the following code:

TTS with list of amplitude values as output, clone the voice from speaker_wav

wav = tts.tts(
text="Hello world!",
speaker_wav="my/cloning/audio.wav",
language="en"
).
But there is no relevant function in server. py that supports this method of cloning speech. Is there a method to implement it in server. py?

@apachexc
Copy link
Author

Xtts requires speechr_id.

If I provide the speaker id, the sound produced by cloning will be the speaker id's sound, not the provided speaker wav's sound.

If no speaker_id is provided, an error will occur.
For example, wav=api.tts(
text=text,
speaker_wav=speaker_wav,
language=language
)

How to solve it?

@apachexc
Copy link
Author

Xtts requires speechr_id
If I provide the speaker id, the sound produced by cloning will be the speaker id's sound, not the provided speaker wav's sound.
Example:
wav = api.tts(text, speaker=speaker_idx, language=language_idx, style_wav=style_wav)
If no speaker_id is provided, an error will occur.
Example:
wav = api.tts(
text=text,
speaker_wav=speaker_wav,
language=language
)
How to solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants