Skip to content

[FT] Additional LiteLLM model config options and better context length estimate #966

@rolshoven

Description

@rolshoven

Issue encountered

Currently, there are a few things that are note exposed through the LiteLLMModelConfig that can be very useful when running evaluations:

  • It would be nice to have a verbose flag, in case you want to debug something related to litellm
  • If one knows the maximum context length of your model, it would be nice if we could set that instead of relying on the default length of 4096 that is currently hardcoded in the max_length property
  • Different APIs can differ in their robustness, and they might have different rate limits. It would be nice to configure the number of retries that are performed when calling the API, as well as the waiting time in between requests, and maybe a timeout if a request takes too long.

Additionally, it would be nice to apply the current strategy for the o1 model in _prepare_max_new_tokens to other reasoning models as well.

Solution/Feature

I created a PR that implements the suggested changes above. It introduces new options in the LiteLLMModelConfig:

"""
(...)
verbose (bool):
    Whether to enable verbose logging. Default is False.
max_model_length (int | None):
    Maximum context length for the model. If None, infers the model's default max length.
api_max_retry (int):
    Maximum number of retries for API requests. Default is 8.
api_retry_sleep (float):
    Initial sleep time (in seconds) between retries. Default is 1.0.
api_retry_multiplier (float):
    Multiplier for increasing sleep time between retries. Default is 2.0.
timeout (float):
    Request timeout in seconds. Default is None (no timeout).
(...)
"""

The increase in the allowed number of tokens is now calculated for all models that are recognized as reasoning models by litellm (as indicated by their supports_reasoning function). Instead of having hardcoded upper bounds, we use litellm's get_max_tokens helper function, or, if this fails, we query the maximum context length from different endpoints on OpenRouter. If the specified provider is present in that list, we get the information right from OpenRouter. Otherwise, we will choose the minimum context length among all OpenRouter providers to ensure that it works at least with all providers listed there. If this also fails, we will return the default context length of 4096, the same one as currently hardcoded.

In order to use the suggest_reasoning function of litellm, I had to update the minimum required version of litellm in the pyproject.toml file to 1.66.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions