Skip to content

v3 openAI block with support for api key passthrough #1214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

hansent
Copy link
Contributor

@hansent hansent commented Apr 23, 2025

Description

Draft V3 OpenAi block with support for api key passthrough via roboflow api

Type of change

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

locally

Any specific deployment considerations

proxy / roboflow api key feature is not live yet

Docs

  • Docs updated? What were the changes:

model_version: Union[
Selector(kind=[STRING_KIND]), Literal["gpt-4o", "gpt-4o-mini"]
] = Field(
default="gpt-4o",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default to gpt-4.1 maybe, since it's the flagship model now.

examples=["auto", "high", "low"],
)
max_tokens: int = Field(
default=450,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to None to avoid problems with reasoning models, since we don't give users the finish reason.

response = client.chat.completions.create(
model=gpt_model_version,
messages=prompt,
max_tokens=max_tokens,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAI deprecated this parameter. There's a new one called max_completion_tokens. Both work with "older" models, but just the new one works with reasoning ones.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_completion_tokens

codeflash-ai bot added a commit that referenced this pull request Apr 23, 2025
 (`openai-apikey-passthrough`)

Based on the line profiling data, a majority of the time is spent on the `client.chat.completions.create` function, which is expected as this involves network latency and processing on OpenAI's servers. While we can't optimize the third-party API call, we can make certain optimizations to reduce redundant operations and improve performance where possible.

### Optimized Code



### Explanation
1. Reuse the `OpenAI` client instance.
   - Instead of creating a new `OpenAI` client instance for every request, I introduced the `OpenAIClient` class to initialize and hold the client instance.
   - This reduces the redundant creation overhead during subsequent requests.
   
2. Use an optional client parameter.
   - The `_execute_openai_request` function optionally accepts a pre-initialized `OpenAIClient` instance (`client`).
   - If a client instance is not passed, it initializes a new one internally.

### Additional Considerations
- The performance gains from reducing the client creation overhead depend on the usage pattern. If many requests are instantiated in a sequence, reusing the client instance will offer significant performance benefits.

By making these changes, the redundant client initialization is avoided, which helps in optimizing the overall runtime. Note that network latency and API processing time on OpenAI's servers are beyond our control.
Copy link
Contributor

codeflash-ai bot commented Apr 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 7,565,420% (75,654.20x) speedup for _execute_openai_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

⏱️ Runtime : 2.20 seconds 29.0 microseconds (best of 21 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

codeflash-ai bot added a commit that referenced this pull request Apr 23, 2025
…(`openai-apikey-passthrough`)

To optimize the given code for better performance, it makes sense to avoid recreating objects and connections on each call if they can be persistent. We can also avoid repeated calculations or assignments and ensure that payload construction is more efficient.

Here's a modified version of the code optimized for performance.



### Key Optimizations.
1. **Persistent OpenAI Client**: Instead of initializing the `OpenAI` client in every request, we create and reuse it. This is done via the `get_openai_client` function which stores clients in a dictionary, avoiding repeated initializations.
2. **Efficient Payload Update**: Use a dictionary unpacking technique to avoid additional conditional checks and dictionary operations.

These changes should help in reducing the runtime and memory footprint, especially in scenarios with frequent requests.
Copy link
Contributor

codeflash-ai bot commented Apr 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 156,043% (1,560.43x) speedup for execute_gpt_4v_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

⏱️ Runtime : 1.90 second 1.22 millisecond (best of 23 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

codeflash-ai bot added a commit that referenced this pull request May 14, 2025
…penai-apikey-passthrough`)

Here’s an optimized version of your function. The **vast majority of runtime (over 99%)** comes from the two lines that interact with the OpenAI SDK.

- `client = OpenAI(api_key=openai_api_key)`
- `client.chat.completions.create(…)`

The first can be improved by **reusing the client instance** instead of creating a new one every call. For repeated calls in the same process, **persisting the OpenAI client** will save you much time.

Here’s an optimized implementation.



**Key optimizations:**
- The OpenAI client is created only once per unique API key, drastically reducing object creation overhead.
- No changes to the function signature or return values.
- Thread safety is not handled explicitly, but if you plan to use this concurrently you could add thread locks or use `threading.local` for clients.

**If you never use multiple API keys in one process,** you may further simplify by keeping a single module-global client instance.

This is as fast as possible on the **client side**. The remote API call, which dominates total runtime, cannot be further optimized from inside the client.
Copy link
Contributor

codeflash-ai bot commented May 14, 2025

⚡️ Codeflash found optimizations for this PR

📄 95% (0.95x) speedup for _execute_openai_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

⏱️ Runtime : 1.69 second 867 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

…penai-apikey-passthrough`)

Here is an optimized version of your program for runtime and memory. The majority of runtime is IO/network-bound (API requests) and not CPU-bound code, so the best possible single-process CPU optimization is to **avoid repeated work** (e.g., repeated endpoint string formatting or client allocation) and **simplify fast paths**. If you can batch or async requests, that would reduce end-to-end latency, but that changes function signatures and semantics so is out of scope. Here we focus on making your function as lean as possible within its expected use. 

**Key improvements:**
- **Reuse OpenAI client (`OpenAI`) where possible**: Creating the client is surprisingly expensive per your profiling.
- **Optimize prompt and payload building:** Avoid unnecessary field-level assignments.
- **Use exception chaining efficiently.**
- **Minimize calls to `.startswith()` by using a tuple form.**
- **Precompute endpoint format string if possible.**
- **Move non-error computations out of try/except.**



**Summary:**  
- OpenAI client creation is now cached, saving repeated cost.
- Efficient prefix checking for OpenAI key.
- Payloads & try/except blocks are trimmed for speed and clarity.
- Function signatures and return values are preserved.
- Comments are updated only where logic is improved or needs clarification.

If you control parallelism at a higher level, running requests in parallel (with `asyncio` or threading) would yield much higher throughput as both requests and OpenAI are IO bound.
Copy link
Contributor

codeflash-ai bot commented May 14, 2025

⚡️ Codeflash found optimizations for this PR

📄 100% (1.00x) speedup for execute_gpt_4v_request in inference/core/workflows/core_steps/models/foundation/openai/v3.py

⏱️ Runtime : 107 milliseconds 53.6 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch openai-apikey-passthrough).

Copy link
Contributor

codeflash-ai bot commented May 19, 2025

This PR is now faster! 🚀 @hansent accepted my optimizations from:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants