-
Notifications
You must be signed in to change notification settings - Fork 346
Description
Issue encountered
Currently, there are a few things that are note exposed through the LiteLLMModelConfig
that can be very useful when running evaluations:
- It would be nice to have a
verbose
flag, in case you want to debug something related to litellm - If one knows the maximum context length of your model, it would be nice if we could set that instead of relying on the default length of 4096 that is currently hardcoded in the
max_length
property - Different APIs can differ in their robustness, and they might have different rate limits. It would be nice to configure the number of retries that are performed when calling the API, as well as the waiting time in between requests, and maybe a timeout if a request takes too long.
Additionally, it would be nice to apply the current strategy for the o1 model in _prepare_max_new_tokens
to other reasoning models as well.
Solution/Feature
I created a PR that implements the suggested changes above. It introduces new options in the LiteLLMModelConfig
:
"""
(...)
verbose (bool):
Whether to enable verbose logging. Default is False.
max_model_length (int | None):
Maximum context length for the model. If None, infers the model's default max length.
api_max_retry (int):
Maximum number of retries for API requests. Default is 8.
api_retry_sleep (float):
Initial sleep time (in seconds) between retries. Default is 1.0.
api_retry_multiplier (float):
Multiplier for increasing sleep time between retries. Default is 2.0.
timeout (float):
Request timeout in seconds. Default is None (no timeout).
(...)
"""
The increase in the allowed number of tokens is now calculated for all models that are recognized as reasoning models by litellm (as indicated by their supports_reasoning
function). Instead of having hardcoded upper bounds, we use litellm's get_max_tokens
helper function, or, if this fails, we query the maximum context length from different endpoints on OpenRouter. If the specified provider is present in that list, we get the information right from OpenRouter. Otherwise, we will choose the minimum context length among all OpenRouter providers to ensure that it works at least with all providers listed there. If this also fails, we will return the default context length of 4096, the same one as currently hardcoded.
In order to use the suggest_reasoning
function of litellm, I had to update the minimum required version of litellm in the pyproject.toml file to 1.66.0.