v3 openAI block with support for api key passthrough #1214

hansent · 2025-04-23T21:31:55Z

Description

Draft V3 OpenAi block with support for api key passthrough via roboflow api

Type of change

New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

locally

Any specific deployment considerations

proxy / roboflow api key feature is not live yet

Docs

Docs updated? What were the changes:

brunopicinin · 2025-04-23T21:49:13Z

inference/core/workflows/core_steps/models/foundation/openai/v3.py

+    model_version: Union[
+        Selector(kind=[STRING_KIND]), Literal["gpt-4o", "gpt-4o-mini"]
+    ] = Field(
+        default="gpt-4o",


Default to gpt-4.1 maybe, since it's the flagship model now.

brunopicinin · 2025-04-23T21:50:08Z

inference/core/workflows/core_steps/models/foundation/openai/v3.py

+        examples=["auto", "high", "low"],
+    )
+    max_tokens: int = Field(
+        default=450,


I think we should default to None to avoid problems with reasoning models, since we don't give users the finish reason.

brunopicinin · 2025-04-23T21:52:07Z

inference/core/workflows/core_steps/models/foundation/openai/v3.py

+        response = client.chat.completions.create(
+            model=gpt_model_version,
+            messages=prompt,
+            max_tokens=max_tokens,


OpenAI deprecated this parameter. There's a new one called max_completion_tokens. Both work with "older" models, but just the new one works with reasoning ones.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_completion_tokens

(`openai-apikey-passthrough`) Based on the line profiling data, a majority of the time is spent on the `client.chat.completions.create` function, which is expected as this involves network latency and processing on OpenAI's servers. While we can't optimize the third-party API call, we can make certain optimizations to reduce redundant operations and improve performance where possible. ### Optimized Code ### Explanation 1. Reuse the `OpenAI` client instance. - Instead of creating a new `OpenAI` client instance for every request, I introduced the `OpenAIClient` class to initialize and hold the client instance. - This reduces the redundant creation overhead during subsequent requests. 2. Use an optional client parameter. - The `_execute_openai_request` function optionally accepts a pre-initialized `OpenAIClient` instance (`client`). - If a client instance is not passed, it initializes a new one internally. ### Additional Considerations - The performance gains from reducing the client creation overhead depend on the usage pattern. If many requests are instantiated in a sequence, reusing the client instance will offer significant performance benefits. By making these changes, the redundant client initialization is avoided, which helps in optimizing the overall runtime. Note that network latency and API processing time on OpenAI's servers are beyond our control.

codeflash-ai · 2025-04-23T23:02:44Z

⚡️ Codeflash found optimizations for this PR

📄 7,565,420% (75,654.20x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⏱️ Runtime : 2.20 seconds → 29.0 microseconds (best of 21 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _execute_openai_request by 7,565,420% in PR #1214 (openai-apikey-passthrough) #1215

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

…(`openai-apikey-passthrough`) To optimize the given code for better performance, it makes sense to avoid recreating objects and connections on each call if they can be persistent. We can also avoid repeated calculations or assignments and ensure that payload construction is more efficient. Here's a modified version of the code optimized for performance. ### Key Optimizations. 1. **Persistent OpenAI Client**: Instead of initializing the `OpenAI` client in every request, we create and reuse it. This is done via the `get_openai_client` function which stores clients in a dictionary, avoiding repeated initializations. 2. **Efficient Payload Update**: Use a dictionary unpacking technique to avoid additional conditional checks and dictionary operations. These changes should help in reducing the runtime and memory footprint, especially in scenarios with frequent requests.

codeflash-ai · 2025-04-23T23:08:40Z

⚡️ Codeflash found optimizations for this PR

📄 156,043% (1,560.43x) speedup for `execute_gpt_4v_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⏱️ Runtime : 1.90 second → 1.22 millisecond (best of 23 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function execute_gpt_4v_request by 156,043% in PR #1214 (openai-apikey-passthrough) #1216

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

…penai-apikey-passthrough`) Here’s an optimized version of your function. The **vast majority of runtime (over 99%)** comes from the two lines that interact with the OpenAI SDK. - `client = OpenAI(api_key=openai_api_key)` - `client.chat.completions.create(…)` The first can be improved by **reusing the client instance** instead of creating a new one every call. For repeated calls in the same process, **persisting the OpenAI client** will save you much time. Here’s an optimized implementation. **Key optimizations:** - The OpenAI client is created only once per unique API key, drastically reducing object creation overhead. - No changes to the function signature or return values. - Thread safety is not handled explicitly, but if you plan to use this concurrently you could add thread locks or use `threading.local` for clients. **If you never use multiple API keys in one process,** you may further simplify by keeping a single module-global client instance. This is as fast as possible on the **client side**. The remote API call, which dominates total runtime, cannot be further optimized from inside the client.

codeflash-ai · 2025-05-14T16:26:46Z

⚡️ Codeflash found optimizations for this PR

📄 95% (0.95x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⏱️ Runtime : 1.69 second → 867 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _execute_openai_request by 95% in PR #1214 (openai-apikey-passthrough) #1279

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

…penai-apikey-passthrough`) Here is an optimized version of your program for runtime and memory. The majority of runtime is IO/network-bound (API requests) and not CPU-bound code, so the best possible single-process CPU optimization is to **avoid repeated work** (e.g., repeated endpoint string formatting or client allocation) and **simplify fast paths**. If you can batch or async requests, that would reduce end-to-end latency, but that changes function signatures and semantics so is out of scope. Here we focus on making your function as lean as possible within its expected use. **Key improvements:** - **Reuse OpenAI client (`OpenAI`) where possible**: Creating the client is surprisingly expensive per your profiling. - **Optimize prompt and payload building:** Avoid unnecessary field-level assignments. - **Use exception chaining efficiently.** - **Minimize calls to `.startswith()` by using a tuple form.** - **Precompute endpoint format string if possible.** - **Move non-error computations out of try/except.** **Summary:** - OpenAI client creation is now cached, saving repeated cost. - Efficient prefix checking for OpenAI key. - Payloads & try/except blocks are trimmed for speed and clarity. - Function signatures and return values are preserved. - Comments are updated only where logic is improved or needs clarification. If you control parallelism at a higher level, running requests in parallel (with `asyncio` or threading) would yield much higher throughput as both requests and OpenAI are IO bound.

codeflash-ai · 2025-05-14T16:33:03Z

⚡️ Codeflash found optimizations for this PR

📄 100% (1.00x) speedup for `execute_gpt_4v_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⏱️ Runtime : 107 milliseconds → 53.6 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function execute_gpt_4v_request by 100% in PR #1214 (openai-apikey-passthrough) #1280

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

…14-2025-05-14T16.32.54

…-05-14T16.32.54 ⚡️ Speed up function `execute_gpt_4v_request` by 100% in PR #1214 (`openai-apikey-passthrough`)

codeflash-ai · 2025-05-19T19:46:06Z

This PR is now faster! 🚀 @hansent accepted my optimizations from:

⚡️ Speed up function execute_gpt_4v_request by 100% in PR #1214 (openai-apikey-passthrough) #1280

v3 openAI block with support for api key passthrough

bbdce48

brunopicinin reviewed Apr 23, 2025

View reviewed changes

codeflash-ai bot mentioned this pull request Apr 23, 2025

⚡️ Speed up function _execute_openai_request by 7,565,420% in PR #1214 (openai-apikey-passthrough) #1215

Closed

codeflash-ai bot mentioned this pull request Apr 23, 2025

⚡️ Speed up function execute_gpt_4v_request by 156,043% in PR #1214 (openai-apikey-passthrough) #1216

Closed

Merge branch 'main' into openai-apikey-passthrough

eb6046f

codeflash-ai bot mentioned this pull request May 14, 2025

⚡️ Speed up function _execute_openai_request by 95% in PR #1214 (openai-apikey-passthrough) #1279

Closed

codeflash-ai bot mentioned this pull request May 14, 2025

⚡️ Speed up function execute_gpt_4v_request by 100% in PR #1214 (openai-apikey-passthrough) #1280

Merged

hansent added 4 commits May 19, 2025 13:38

Merge branch 'main' into openai-apikey-passthrough

255c1e6

Revert OpenAI key kind changes for v1 and v2

c8c40d7

Merge branch 'openai-apikey-passthrough' into codeflash/optimize-pr12…

f146744

…14-2025-05-14T16.32.54

Merge pull request #1280 from roboflow/codeflash/optimize-pr1214-2025…

b8fa0dc

…-05-14T16.32.54 ⚡️ Speed up function `execute_gpt_4v_request` by 100% in PR #1214 (`openai-apikey-passthrough`)

hansent and others added 4 commits May 23, 2025 09:15

Merge branch 'main' into openai-apikey-passthrough

02a61fd

rename OPENAI_API_KEY_KIND to ROBOFLOW_MANAGED_KEY

1f7a525

Merge branch 'main' into openai-apikey-passthrough

e6e85fb

Add default

15e100c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3 openAI block with support for api key passthrough #1214

v3 openAI block with support for api key passthrough #1214

hansent commented Apr 23, 2025

Uh oh!

brunopicinin Apr 23, 2025

Uh oh!

brunopicinin Apr 23, 2025

Uh oh!

brunopicinin Apr 23, 2025

Uh oh!

codeflash-ai bot commented Apr 23, 2025

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

Uh oh!

codeflash-ai bot commented Apr 23, 2025

⚡️ Speed up function `execute_gpt_4v_request` by 156,043% in PR #1214 (`openai-apikey-passthrough`) #1216

Uh oh!

codeflash-ai bot commented May 14, 2025

⚡️ Speed up function `_execute_openai_request` by 95% in PR #1214 (`openai-apikey-passthrough`) #1279

Uh oh!

codeflash-ai bot commented May 14, 2025

⚡️ Speed up function `execute_gpt_4v_request` by 100% in PR #1214 (`openai-apikey-passthrough`) #1280

Uh oh!

codeflash-ai bot commented May 19, 2025

Uh oh!

Uh oh!

v3 openAI block with support for api key passthrough #1214

Are you sure you want to change the base?

v3 openAI block with support for api key passthrough #1214

Conversation

hansent commented Apr 23, 2025

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

brunopicinin Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

brunopicinin Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

brunopicinin Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

codeflash-ai bot commented Apr 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 7,565,420% (75,654.20x) speedup for _execute_openai_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _execute_openai_request by 7,565,420% in PR #1214 (openai-apikey-passthrough) #1215

Uh oh!

codeflash-ai bot commented Apr 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 156,043% (1,560.43x) speedup for execute_gpt_4v_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function execute_gpt_4v_request by 156,043% in PR #1214 (openai-apikey-passthrough) #1216

Uh oh!

codeflash-ai bot commented May 14, 2025

⚡️ Codeflash found optimizations for this PR

📄 95% (0.95x) speedup for _execute_openai_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function _execute_openai_request by 95% in PR #1214 (openai-apikey-passthrough) #1279

Uh oh!

codeflash-ai bot commented May 14, 2025

⚡️ Codeflash found optimizations for this PR

📄 100% (1.00x) speedup for execute_gpt_4v_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up function execute_gpt_4v_request by 100% in PR #1214 (openai-apikey-passthrough) #1280

Uh oh!

codeflash-ai bot commented May 19, 2025

Uh oh!

Uh oh!

📄 7,565,420% (75,654.20x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⚡️ Speed up function `_execute_openai_request` by 7,565,420% in PR #1214 (`openai-apikey-passthrough`) #1215

📄 156,043% (1,560.43x) speedup for `execute_gpt_4v_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⚡️ Speed up function `execute_gpt_4v_request` by 156,043% in PR #1214 (`openai-apikey-passthrough`) #1216

📄 95% (0.95x) speedup for `_execute_openai_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⚡️ Speed up function `_execute_openai_request` by 95% in PR #1214 (`openai-apikey-passthrough`) #1279

📄 100% (1.00x) speedup for `execute_gpt_4v_request` in `inference/core/workflows/core_steps/models/foundation/openai/v3.py`

⚡️ Speed up function `execute_gpt_4v_request` by 100% in PR #1214 (`openai-apikey-passthrough`) #1280