-
Notifications
You must be signed in to change notification settings - Fork 182
v3 openAI block with support for api key passthrough #1214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
model_version: Union[ | ||
Selector(kind=[STRING_KIND]), Literal["gpt-4o", "gpt-4o-mini"] | ||
] = Field( | ||
default="gpt-4o", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default to gpt-4.1
maybe, since it's the flagship model now.
examples=["auto", "high", "low"], | ||
) | ||
max_tokens: int = Field( | ||
default=450, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default to None
to avoid problems with reasoning models, since we don't give users the finish reason.
response = client.chat.completions.create( | ||
model=gpt_model_version, | ||
messages=prompt, | ||
max_tokens=max_tokens, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenAI deprecated this parameter. There's a new one called max_completion_tokens
. Both work with "older" models, but just the new one works with reasoning ones.
https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_completion_tokens
(`openai-apikey-passthrough`) Based on the line profiling data, a majority of the time is spent on the `client.chat.completions.create` function, which is expected as this involves network latency and processing on OpenAI's servers. While we can't optimize the third-party API call, we can make certain optimizations to reduce redundant operations and improve performance where possible. ### Optimized Code ### Explanation 1. Reuse the `OpenAI` client instance. - Instead of creating a new `OpenAI` client instance for every request, I introduced the `OpenAIClient` class to initialize and hold the client instance. - This reduces the redundant creation overhead during subsequent requests. 2. Use an optional client parameter. - The `_execute_openai_request` function optionally accepts a pre-initialized `OpenAIClient` instance (`client`). - If a client instance is not passed, it initializes a new one internally. ### Additional Considerations - The performance gains from reducing the client creation overhead depend on the usage pattern. If many requests are instantiated in a sequence, reusing the client instance will offer significant performance benefits. By making these changes, the redundant client initialization is avoided, which helps in optimizing the overall runtime. Note that network latency and API processing time on OpenAI's servers are beyond our control.
⚡️ Codeflash found optimizations for this PR📄 7,565,420% (75,654.20x) speedup for
|
…(`openai-apikey-passthrough`) To optimize the given code for better performance, it makes sense to avoid recreating objects and connections on each call if they can be persistent. We can also avoid repeated calculations or assignments and ensure that payload construction is more efficient. Here's a modified version of the code optimized for performance. ### Key Optimizations. 1. **Persistent OpenAI Client**: Instead of initializing the `OpenAI` client in every request, we create and reuse it. This is done via the `get_openai_client` function which stores clients in a dictionary, avoiding repeated initializations. 2. **Efficient Payload Update**: Use a dictionary unpacking technique to avoid additional conditional checks and dictionary operations. These changes should help in reducing the runtime and memory footprint, especially in scenarios with frequent requests.
⚡️ Codeflash found optimizations for this PR📄 156,043% (1,560.43x) speedup for
|
…penai-apikey-passthrough`) Here’s an optimized version of your function. The **vast majority of runtime (over 99%)** comes from the two lines that interact with the OpenAI SDK. - `client = OpenAI(api_key=openai_api_key)` - `client.chat.completions.create(…)` The first can be improved by **reusing the client instance** instead of creating a new one every call. For repeated calls in the same process, **persisting the OpenAI client** will save you much time. Here’s an optimized implementation. **Key optimizations:** - The OpenAI client is created only once per unique API key, drastically reducing object creation overhead. - No changes to the function signature or return values. - Thread safety is not handled explicitly, but if you plan to use this concurrently you could add thread locks or use `threading.local` for clients. **If you never use multiple API keys in one process,** you may further simplify by keeping a single module-global client instance. This is as fast as possible on the **client side**. The remote API call, which dominates total runtime, cannot be further optimized from inside the client.
⚡️ Codeflash found optimizations for this PR📄 95% (0.95x) speedup for
|
…penai-apikey-passthrough`) Here is an optimized version of your program for runtime and memory. The majority of runtime is IO/network-bound (API requests) and not CPU-bound code, so the best possible single-process CPU optimization is to **avoid repeated work** (e.g., repeated endpoint string formatting or client allocation) and **simplify fast paths**. If you can batch or async requests, that would reduce end-to-end latency, but that changes function signatures and semantics so is out of scope. Here we focus on making your function as lean as possible within its expected use. **Key improvements:** - **Reuse OpenAI client (`OpenAI`) where possible**: Creating the client is surprisingly expensive per your profiling. - **Optimize prompt and payload building:** Avoid unnecessary field-level assignments. - **Use exception chaining efficiently.** - **Minimize calls to `.startswith()` by using a tuple form.** - **Precompute endpoint format string if possible.** - **Move non-error computations out of try/except.** **Summary:** - OpenAI client creation is now cached, saving repeated cost. - Efficient prefix checking for OpenAI key. - Payloads & try/except blocks are trimmed for speed and clarity. - Function signatures and return values are preserved. - Comments are updated only where logic is improved or needs clarification. If you control parallelism at a higher level, running requests in parallel (with `asyncio` or threading) would yield much higher throughput as both requests and OpenAI are IO bound.
⚡️ Codeflash found optimizations for this PR📄 100% (1.00x) speedup for
|
…14-2025-05-14T16.32.54
…-05-14T16.32.54 ⚡️ Speed up function `execute_gpt_4v_request` by 100% in PR #1214 (`openai-apikey-passthrough`)
This PR is now faster! 🚀 @hansent accepted my optimizations from: |
Description
Draft V3 OpenAi block with support for api key passthrough via roboflow api
Type of change
How has this change been tested, please provide a testcase or example of how you tested the change?
locally
Any specific deployment considerations
proxy / roboflow api key feature is not live yet
Docs