Skip to content

Commit 8efb229

Browse files
⚡️ Speed up function _execute_openai_request by 95% in PR #1214 (openai-apikey-passthrough)
Here’s an optimized version of your function. The **vast majority of runtime (over 99%)** comes from the two lines that interact with the OpenAI SDK. - `client = OpenAI(api_key=openai_api_key)` - `client.chat.completions.create(…)` The first can be improved by **reusing the client instance** instead of creating a new one every call. For repeated calls in the same process, **persisting the OpenAI client** will save you much time. Here’s an optimized implementation. **Key optimizations:** - The OpenAI client is created only once per unique API key, drastically reducing object creation overhead. - No changes to the function signature or return values. - Thread safety is not handled explicitly, but if you plan to use this concurrently you could add thread locks or use `threading.local` for clients. **If you never use multiple API keys in one process,** you may further simplify by keeping a single module-global client instance. This is as fast as possible on the **client side**. The remote API call, which dominates total runtime, cannot be further optimized from inside the client.
1 parent eb6046f commit 8efb229

File tree

1 file changed

+17
-4
lines changed
  • inference/core/workflows/core_steps/models/foundation/openai

1 file changed

+17
-4
lines changed

inference/core/workflows/core_steps/models/foundation/openai/v3.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,10 @@
88
from openai._types import NOT_GIVEN
99
from pydantic import ConfigDict, Field, model_validator
1010

11-
from inference.core.env import WORKFLOWS_REMOTE_EXECUTION_MAX_STEP_CONCURRENT_REQUESTS, API_BASE_URL
11+
from inference.core.env import (
12+
WORKFLOWS_REMOTE_EXECUTION_MAX_STEP_CONCURRENT_REQUESTS,
13+
API_BASE_URL,
14+
)
1215
from inference.core.managers.base import ModelManager
1316
from inference.core.utils.image_utils import encode_image_to_jpeg_bytes, load_image
1417
from inference.core.workflows.core_steps.common.utils import run_in_parallel
@@ -83,7 +86,6 @@
8386
}
8487

8588

86-
8789
class BlockManifest(WorkflowBlockManifest):
8890
model_config = ConfigDict(
8991
json_schema_extra={
@@ -329,7 +331,7 @@ def run_gpt_4v_llm_prompting(
329331

330332

331333
def execute_gpt_4v_requests(
332-
roboflow_api_key:str,
334+
roboflow_api_key: str,
333335
openai_api_key: str,
334336
gpt4_prompts: List[List[dict]],
335337
gpt_model_version: str,
@@ -401,7 +403,7 @@ def _execute_openai_request(
401403
"""Executes OpenAI request directly."""
402404
temp_value = temperature if temperature is not None else NOT_GIVEN
403405
try:
404-
client = OpenAI(api_key=openai_api_key)
406+
client = _get_openai_client(openai_api_key) # Reuse client per API key
405407
response = client.chat.completions.create(
406408
model=gpt_model_version,
407409
messages=prompt,
@@ -641,6 +643,15 @@ def prepare_structured_answering_prompt(
641643
]
642644

643645

646+
def _get_openai_client(api_key: str):
647+
"""Helper to cache and retrieve OpenAI client by API key."""
648+
client = _openai_clients.get(api_key)
649+
if client is None:
650+
client = OpenAI(api_key=api_key)
651+
_openai_clients[api_key] = client
652+
return client
653+
654+
644655
PROMPT_BUILDERS = {
645656
"unconstrained": prepare_unconstrained_prompt,
646657
"ocr": prepare_ocr_prompt,
@@ -651,3 +662,5 @@ def prepare_structured_answering_prompt(
651662
"multi-label-classification": prepare_multi_label_classification_prompt,
652663
"structured-answering": prepare_structured_answering_prompt,
653664
}
665+
666+
_openai_clients = {}

0 commit comments

Comments
 (0)