Skip to content

[Python] Speech recognizer stops continuous recognition eventually by itself #2760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Hugeldugelking opened this issue Feb 27, 2025 · 5 comments

Comments

@Hugeldugelking
Copy link

Hello, a bit of context before the bug description:

I am trying to setup a python websocket server where clients (e.g. web based) can connect to and stream speech from the devices microphone. Because i want the speech data to be sent encoded as an OPUS stream from the client to the server, i am required to convert the stream to PCM.

For that, i am using gstreamer (as explained in the docs) and a PushAudioInputStream. As i am required to support numerous voice streams on a single server instance that may be running for a long time, i am testing with several opened browser windows streaming the microphone input for long periods of time.

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

  • Speech SDK log taken from a run that exhibits the reported issue.
    See instructions on how to take logs.
    appended at the end

  • A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
    appended at the end

Describe the bug

On my local machine (in a docker container) i am able to transcribe tens of streams simultaneously for hours without any problems.

But when i deploy the image to an Azure App Service and run 10 streams concurrently, the session is stopped automatically after a time between 10-30 minutes. The session is not canceled as i assume it would happen in case of an error but it is stopped without giving the reason.

Based on the script, i am getting the following console output: SESSION STOPPED SessionEventArgs(session_id=xyz)

To Reproduce

Upload a docker image with the script on azure app service (or maybe a device with limited cpu/memory resources?) and start multiple streams. Then wait and the stream should stop

Expected behavior

The continuous recognition should not stop until i directly tell the recognizer to do so using .stop_continuous_recognition(). I expect the recognition to work for hours.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech 1.42.0

Platform, Operating System, and Programming Language

  • OS: Windows/Linux in a Docker container with python:3.13-slim base image
  • Hardware - x64
  • Programming language: Python

Additional context

n/a

The script:

import asyncio, time, logging
import websockets
import azure.cognitiveservices.speech as speechsdk

logger = logging.getLogger("voice")
logger.setLevel(logging.DEBUG)  # Set the logging level

# Replace these with your actual Azure Speech Service credentials
SPEECH_KEY = ""
SERVICE_REGION = ""
ENDPOINT_ID = ""

async def handle_client(websocket):
    """
    Handles incoming WebSocket connections and streams audio data to Azure Speech SDK.
    Sends back transcriptions to the client.
    """
    logger.info(f"New client connected from {websocket.remote_address}")
    tasks = set()

    compressed_format = speechsdk.audio.AudioStreamFormat(compressed_stream_format=speechsdk.AudioStreamContainerFormat.ANY)
    # Initialize PushAudioInputStream
    push_stream = speechsdk.audio.PushAudioInputStream(stream_format=compressed_format)

    # Configure Speech SDK
    speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SERVICE_REGION)
    if ENDPOINT_ID:
        speech_config.endpoint_id = ENDPOINT_ID

    speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, f"/var/log/voice/{int(time.time())}.log")
    audio_config = speechsdk.audio.AudioConfig(stream=push_stream)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    queue = asyncio.Queue()

    async def send_recognized_text():
        while True:
            text = await queue.get()
            if text:
                await websocket.send(text)

    task = asyncio.create_task(send_recognized_text())
    tasks.add(task)
    loop = asyncio.get_event_loop()

    def recognizing_handler(event):
        logger.info(f"RECOGNIZING: {event.result.text}")
        queue.put_nowait(event.result.text)

    def recognized_handler(event):
        logger.info(f"RECOGNIZED: {event.result.text}")
        queue.put_nowait(event.result.text)

    def session_started_handler(event):
        logger.info("SESSION STARTED")

    def session_stopped_handler(event):
        logger.info("SESSION STOPPED {}".format(event))
        asyncio.run_coroutine_threadsafe(websocket.close(), loop).result()
        logger.info("WebSocket closed")

    def canceled_handler(event):
        logger.info(f"CANCELED: {event.reason}")
        asyncio.run_coroutine_threadsafe(websocket.close(), loop).result()
        logger.info("WebSocket closed")

    # Connect event handlers
    speech_recognizer.recognizing.connect(recognizing_handler)
    speech_recognizer.recognized.connect(recognized_handler)
    speech_recognizer.session_started.connect(session_started_handler)
    speech_recognizer.session_stopped.connect(session_stopped_handler)
    speech_recognizer.canceled.connect(canceled_handler)

    # Start continuous recognition
    speech_recognizer.start_continuous_recognition()
    logger.info("Speech recognition started")

    try:
        async for message in websocket:
            if isinstance(message, bytes):
                push_stream.write(message)
            else:
                logger.info("Received non-binary message. Ignoring.")
    except websockets.exceptions.ConnectionClosed as e:
        logger.info(f"Client disconnected: {e}")
    except Exception as e:
        logger.info(f"Error: {e}")
    finally:
        # Clean up
        for task in tasks:
            task.cancel()
        speech_recognizer.stop_continuous_recognition()
        push_stream.close()
        logger.info("Speech recognition stopped")

async def main():
    """
    Starts the WebSocket server.
    """
    server = await websockets.serve(handle_client, "0.0.0.0", 8765)
    logger.info("WebSocket server started on ws://0.0.0.0:8765")
    await server.wait_closed()

if __name__ == "__main__":
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        logger.info("Server shutdown requested by user")

The sdk's log:
As the log file is 500MB in size, i cut out the last 2000 lines and uploaded it on pastebin.

Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label Mar 19, 2025
@rhurey
Copy link
Member

rhurey commented Mar 20, 2025

There's a session stopped event in the log for session 53d8749ba2374f6aa21dbffd94a78192 that reported it had processed all the audio and was stopping. It should have raised a canceled event with an end of stream reason.

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label Mar 21, 2025
@Hugeldugelking
Copy link
Author

Thank you for looking into this.

Well, i never saw any canceled events in my logs, it always was a session stopped event.
And how could it have all audio processed? I am continuously pushing new audio data every 200-400ms to the stream that i receive over websocket.
It may be notable that I have a webpage that accesses the microphone through the browser and sends the chunked webm/opus data over websocket. If i start multiple streams over multiple tabs the sessions are also stopping more or less simultaneously (with a difference of a few seconds)

Copy link

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

@github-actions github-actions bot added the update needed For items that are in progress but have not been updated label Apr 10, 2025
@wxj127
Copy link

wxj127 commented May 13, 2025

I use webm/opus on the web side, sending audio blocks every 180ms, using java sdk on the server side, and have filtered out the case where the length of byte[] is equal to 0. Even if PushAudioInputStream is not closed, EndOfStream events will still appear from time to time. AudioStreamFormat uses AudioStreamContainerFormat.ANY; what is the cause of this problem? ?

@github-actions github-actions bot removed the update needed For items that are in progress but have not been updated label May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants