⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

codeflash-ai · 2025-04-23T23:02:41Z

⚡️ This pull request contains optimizations for PR #1214

If you approve this dependent PR, these changes will be merged into the original PR branch openai-apikey-passthrough.

This PR will be automatically closed if the original PR is merged.

📄 7,565,420% (75,654.20x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⏱️ Runtime : 2.20 seconds → 29.0 microseconds (best of 21 runs)

📝 Explanation and details

Based on the line profiling data, a majority of the time is spent on the client.chat.completions.create function, which is expected as this involves network latency and processing on OpenAI's servers. While we can't optimize the third-party API call, we can make certain optimizations to reduce redundant operations and improve performance where possible.

Optimized Code

Explanation

Reuse the OpenAI client instance.
- Instead of creating a new OpenAI client instance for every request, I introduced the OpenAIClient class to initialize and hold the client instance.
- This reduces the redundant creation overhead during subsequent requests.
Use an optional client parameter.
- The _execute_openai_request function optionally accepts a pre-initialized OpenAIClient instance (client).
- If a client instance is not passed, it initializes a new one internally.

Additional Considerations

The performance gains from reducing the client creation overhead depend on the usage pattern. If many requests are instantiated in a sequence, reusing the client instance will offer significant performance benefits.

By making these changes, the redundant client initialization is avoided, which helps in optimizing the overall runtime. Note that network latency and API processing time on OpenAI's servers are beyond our control.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 9 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage

🌀 Generated Regression Tests Details

from typing import List, Optional

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.models.foundation.openai.v3 import \
    _execute_openai_request
from openai import OpenAI
from openai._types import NOT_GIVEN

# unit tests

# Mock OpenAI client and response
class MockOpenAIClient:
    def __init__(self, api_key):
        self.api_key = api_key
        self.chat = self.Chat()

    class Chat:
        def __init__(self):
            self.completions = self.Completions()

        class Completions:
            def create(self, model, messages, max_tokens, temperature):
                return MockResponse()

class MockResponse:
    def __init__(self):
        self.choices = [self.Choice()]

    class Choice:
        def __init__(self):
            self.message = self.Message()

        class Message:
            def __init__(self):
                self.content = "Mock response content"

@pytest.fixture
def mock_openai(monkeypatch):
    monkeypatch.setattr("openai.OpenAI", MockOpenAIClient)


def test_edge_cases_empty_or_minimal_inputs(mock_openai):
    # Test with an empty prompt list
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="valid_key",
            prompt=[],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=None
        )

    # Test with a prompt containing an empty message
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="valid_key",
            prompt=[{"role": "user", "content": ""}],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=None
        )


def test_invalid_inputs(mock_openai):
    # Test with an invalid API key
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="",
            prompt=[{"role": "user", "content": "Hello"}],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=None
        )

    # Test with a prompt missing required fields
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="valid_key",
            prompt=[{"role": "user"}],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=None
        )

    # Test with a prompt having incorrect data types
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="valid_key",
            prompt="This is not a list",
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=None
        )

    # Test with a non-existent model version
    with pytest.raises(RuntimeError):
        _execute_openai_request(
            openai_api_key="valid_key",
            prompt=[{"role": "user", "content": "Hello"}],
            gpt_model_version="gpt-unknown",
            max_tokens=50,
            temperature=None
        )




from typing import List, Optional
from unittest.mock import Mock, patch

# imports
import pytest  # used for our unit tests
from inference.core.workflows.core_steps.models.foundation.openai.v3 import \
    _execute_openai_request
from openai import OpenAI
from openai._types import NOT_GIVEN

# unit tests

# Mock the OpenAI client to avoid actual API calls
@pytest.fixture
def mock_openai_client():
    with patch('openai.OpenAI') as mock:
        yield mock

# Basic Functionality





def test_invalid_api_key(mock_openai_client):
    # Mock response to raise an exception
    mock_openai_client.return_value.chat.completions.create.side_effect = Exception("Invalid API key")

    with pytest.raises(RuntimeError) as excinfo:
        _execute_openai_request(
            openai_api_key="invalid_api_key",
            prompt=[{"role": "user", "content": "Hello"}],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=0.7
        )

def test_invalid_model_version(mock_openai_client):
    # Mock response to raise an exception
    mock_openai_client.return_value.chat.completions.create.side_effect = Exception("Invalid model version")

    with pytest.raises(RuntimeError) as excinfo:
        _execute_openai_request(
            openai_api_key="valid_api_key",
            prompt=[{"role": "user", "content": "Hello"}],
            gpt_model_version="invalid-model",
            max_tokens=50,
            temperature=0.7
        )

def test_invalid_prompt_structure(mock_openai_client):
    # Mock response to raise an exception
    mock_openai_client.return_value.chat.completions.create.side_effect = Exception("Invalid prompt structure")

    with pytest.raises(RuntimeError) as excinfo:
        _execute_openai_request(
            openai_api_key="valid_api_key",
            prompt=[{"invalid_key": "invalid_value"}],
            gpt_model_version="gpt-3.5-turbo",
            max_tokens=50,
            temperature=0.7
        )

# Large Scale Test Cases

To edit these changes git checkout codeflash/optimize-pr1214-2025-04-23T23.02.35 and push.

(`openai-apikey-passthrough`) Based on the line profiling data, a majority of the time is spent on the `client.chat.completions.create` function, which is expected as this involves network latency and processing on OpenAI's servers. While we can't optimize the third-party API call, we can make certain optimizations to reduce redundant operations and improve performance where possible. ### Optimized Code ### Explanation 1. Reuse the `OpenAI` client instance. - Instead of creating a new `OpenAI` client instance for every request, I introduced the `OpenAIClient` class to initialize and hold the client instance. - This reduces the redundant creation overhead during subsequent requests. 2. Use an optional client parameter. - The `_execute_openai_request` function optionally accepts a pre-initialized `OpenAIClient` instance (`client`). - If a client instance is not passed, it initializes a new one internally. ### Additional Considerations - The performance gains from reducing the client creation overhead depend on the usage pattern. If many requests are instantiated in a sequence, reusing the client instance will offer significant performance benefits. By making these changes, the redundant client initialization is avoided, which helps in optimizing the overall runtime. Note that network latency and API processing time on OpenAI's servers are beyond our control.

grzegorz-roboflow · 2025-04-24T08:22:48Z

inference/core/workflows/core_steps/models/foundation/openai/v3.py

-        client = OpenAI(api_key=openai_api_key)
-        response = client.chat.completions.create(
+        if client is None:
+            client = OpenAIClient(api_key=openai_api_key)


I can't see OpenAIClient imported anywhere

yep, looks like the optimization was correctly found when we processed it. But before opening the PR, the code diff missed some global variables and this happened. We will fix this in the next codeflash version.

codeflash-ai bot requested a review from PawelPeczek-Roboflow as a code owner April 23, 2025 23:02

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Apr 23, 2025

codeflash-ai bot requested review from grzegorz-roboflow, yeldarby, probicheaux and hansent as code owners April 23, 2025 23:02

codeflash-ai bot mentioned this pull request Apr 23, 2025

v3 openAI block with support for api key passthrough #1214

Merged

2 tasks

grzegorz-roboflow requested changes Apr 24, 2025

View reviewed changes

PawelPeczek-Roboflow closed this Apr 25, 2025

codeflash-ai bot deleted the codeflash/optimize-pr1214-2025-04-23T23.02.35 branch April 25, 2025 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

Uh oh!

codeflash-ai bot commented Apr 23, 2025

Uh oh!

grzegorz-roboflow Apr 24, 2025

Uh oh!

misrasaurabh1 Apr 25, 2025

Uh oh!

Uh oh!

⚡️ Speed up function _execute_openai_request by 7,565,420% in PR #1214 (openai-apikey-passthrough) #1215

⚡️ Speed up function _execute_openai_request by 7,565,420% in PR #1214 (openai-apikey-passthrough) #1215

Uh oh!

Conversation

codeflash-ai bot commented Apr 23, 2025

⚡️ This pull request contains optimizations for PR #1214

📄 7,565,420% (75,654.20x) speedup for _execute_openai_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

Optimized Code

Explanation

Additional Considerations

Uh oh!

grzegorz-roboflow Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

📄 7,565,420% (75,654.20x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`