[Python] Speech recognizer stops continuous recognition eventually by itself #2760

Hugeldugelking · 2025-02-27T15:45:55Z

Hello, a bit of context before the bug description:

I am trying to setup a python websocket server where clients (e.g. web based) can connect to and stream speech from the devices microphone. Because i want the speech data to be sent encoded as an OPUS stream from the client to the server, i am required to convert the stream to PCM.

For that, i am using gstreamer (as explained in the docs) and a PushAudioInputStream. As i am required to support numerous voice streams on a single server instance that may be running for a long time, i am testing with several opened browser windows streaming the microphone input for long periods of time.

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

Speech SDK log taken from a run that exhibits the reported issue.
See instructions on how to take logs.
appended at the end
A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
appended at the end

Describe the bug

On my local machine (in a docker container) i am able to transcribe tens of streams simultaneously for hours without any problems.

But when i deploy the image to an Azure App Service and run 10 streams concurrently, the session is stopped automatically after a time between 10-30 minutes. The session is not canceled as i assume it would happen in case of an error but it is stopped without giving the reason.

Based on the script, i am getting the following console output: SESSION STOPPED SessionEventArgs(session_id=xyz)

To Reproduce

Upload a docker image with the script on azure app service (or maybe a device with limited cpu/memory resources?) and start multiple streams. Then wait and the stream should stop

Expected behavior

The continuous recognition should not stop until i directly tell the recognizer to do so using .stop_continuous_recognition(). I expect the recognition to work for hours.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech 1.42.0

Platform, Operating System, and Programming Language

OS: Windows/Linux in a Docker container with python:3.13-slim base image
Hardware - x64
Programming language: Python

Additional context

n/a

The script:

import asyncio, time, logging
import websockets
import azure.cognitiveservices.speech as speechsdk

logger = logging.getLogger("voice")
logger.setLevel(logging.DEBUG)  # Set the logging level

# Replace these with your actual Azure Speech Service credentials
SPEECH_KEY = ""
SERVICE_REGION = ""
ENDPOINT_ID = ""

async def handle_client(websocket):
    """
    Handles incoming WebSocket connections and streams audio data to Azure Speech SDK.
    Sends back transcriptions to the client.
    """
    logger.info(f"New client connected from {websocket.remote_address}")
    tasks = set()

    compressed_format = speechsdk.audio.AudioStreamFormat(compressed_stream_format=speechsdk.AudioStreamContainerFormat.ANY)
    # Initialize PushAudioInputStream
    push_stream = speechsdk.audio.PushAudioInputStream(stream_format=compressed_format)

    # Configure Speech SDK
    speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)
    if ENDPOINT_ID:
        speech_config.endpoint_id = ENDPOINT_ID

    speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, f"/var/log/voice/{int(time.time())}.log")
    audio_config = speechsdk.audio.AudioConfig(stream=push_stream)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    queue = asyncio.Queue()

    async def send_recognized_text():
        while True:
            text = await queue.get()
            if text:
                await websocket.send(text)

    task = asyncio.create_task(send_recognized_text())
    tasks.add(task)
    loop = asyncio.get_event_loop()

    def recognizing_handler(event):
        logger.info(f"RECOGNIZING: {event.result.text}")
        queue.put_nowait(event.result.text)

    def recognized_handler(event):
        logger.info(f"RECOGNIZED: {event.result.text}")
        queue.put_nowait(event.result.text)

    def session_started_handler(event):
        logger.info("SESSION STARTED")

    def session_stopped_handler(event):
        logger.info("SESSION STOPPED {}".format(event))
        asyncio.run_coroutine_threadsafe(websocket.close(), loop).result()
        logger.info("WebSocket closed")

    def canceled_handler(event):
        logger.info(f"CANCELED: {event.reason}")
        asyncio.run_coroutine_threadsafe(websocket.close(), loop).result()
        logger.info("WebSocket closed")

    # Connect event handlers
    speech_recognizer.recognizing.connect(recognizing_handler)
    speech_recognizer.recognized.connect(recognized_handler)
    speech_recognizer.session_started.connect(session_started_handler)
    speech_recognizer.session_stopped.connect(session_stopped_handler)
    speech_recognizer.canceled.connect(canceled_handler)

    # Start continuous recognition
    speech_recognizer.start_continuous_recognition()
    logger.info("Speech recognition started")

    try:
        async for message in websocket:
            if isinstance(message, bytes):
                push_stream.write(message)
            else:
                logger.info("Received non-binary message. Ignoring.")
    except websockets.exceptions.ConnectionClosed as e:
        logger.info(f"Client disconnected: {e}")
    except Exception as e:
        logger.info(f"Error: {e}")
    finally:
        # Clean up
        for task in tasks:
            task.cancel()
        speech_recognizer.stop_continuous_recognition()
        push_stream.close()
        logger.info("Speech recognition stopped")

async def main():
    """
    Starts the WebSocket server.
    """
    server = await websockets.serve(handle_client, "0.0.0.0", 8765)
    logger.info("WebSocket server started on ws://0.0.0.0:8765")
    await server.wait_closed()

if __name__ == "__main__":
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        logger.info("Server shutdown requested by user")

The sdk's log:
As the log file is 500MB in size, i cut out the last 2000 lines and uploaded it on pastebin.

The text was updated successfully, but these errors were encountered:

github-actions · 2025-03-19T02:26:20Z

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

rhurey · 2025-03-20T21:57:14Z

There's a session stopped event in the log for session 53d8749ba2374f6aa21dbffd94a78192 that reported it had processed all the audio and was stopping. It should have raised a canceled event with an end of stream reason.

Hugeldugelking · 2025-03-21T08:07:36Z

Thank you for looking into this.

Well, i never saw any canceled events in my logs, it always was a session stopped event.
And how could it have all audio processed? I am continuously pushing new audio data every 200-400ms to the stream that i receive over websocket.
It may be notable that I have a webpage that accesses the microphone through the browser and sends the chunked webm/opus data over websocket. If i start multiple streams over multiple tabs the sessions are also stopping more or less simultaneously (with a difference of a few seconds)

github-actions · 2025-04-10T02:27:30Z

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

wxj127 · 2025-05-13T10:06:42Z

I use webm/opus on the web side, sending audio blocks every 180ms, using java sdk on the server side, and have filtered out the case where the length of byte[] is equal to 0. Even if PushAudioInputStream is not closed, EndOfStream events will still appear from time to time. AudioStreamFormat uses AudioStreamContainerFormat.ANY; what is the cause of this problem? ?

github-actions bot added the update needed For items that are in progress but have not been updated label Mar 19, 2025

github-actions bot removed the update needed For items that are in progress but have not been updated label Mar 21, 2025

github-actions bot added the update needed For items that are in progress but have not been updated label Apr 10, 2025

github-actions bot removed the update needed For items that are in progress but have not been updated label May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python] Speech recognizer stops continuous recognition eventually by itself #2760

[Python] Speech recognizer stops continuous recognition eventually by itself #2760

Hugeldugelking commented Feb 27, 2025

github-actions bot commented Mar 19, 2025

Uh oh!

rhurey commented Mar 20, 2025

Uh oh!

Hugeldugelking commented Mar 21, 2025

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

wxj127 commented May 13, 2025

Uh oh!

[Python] Speech recognizer stops continuous recognition eventually by itself #2760

[Python] Speech recognizer stops continuous recognition eventually by itself #2760

Comments

Hugeldugelking commented Feb 27, 2025

github-actions bot commented Mar 19, 2025

Uh oh!

rhurey commented Mar 20, 2025

Uh oh!

Hugeldugelking commented Mar 21, 2025

Uh oh!

github-actions bot commented Apr 10, 2025

Uh oh!

wxj127 commented May 13, 2025

Uh oh!