[FT] Additional LiteLLM model config options and better context length estimate

## Issue encountered
Currently, there are a few things that are note exposed through the `LiteLLMModelConfig` that can be very useful when running evaluations:

- It would be nice to have a `verbose` flag, in case you want to debug something related to litellm
- If one knows the maximum context length of your model, it would be nice if we could set that instead of relying on the default length of 4096 that is currently hardcoded in the `max_length` property
- Different APIs can differ in their robustness, and they might have different rate limits. It would be nice to configure the number of retries that are performed when calling the API, as well as the waiting time in between requests, and maybe a timeout if a request takes too long.

Additionally, it would be nice to apply the current strategy for the o1 model in `_prepare_max_new_tokens` to other reasoning models as well.

## Solution/Feature
I created a PR that implements the suggested changes above. It introduces new options in the `LiteLLMModelConfig`:

```python
"""
(...)
verbose (bool):
    Whether to enable verbose logging. Default is False.
max_model_length (int | None):
    Maximum context length for the model. If None, infers the model's default max length.
api_max_retry (int):
    Maximum number of retries for API requests. Default is 8.
api_retry_sleep (float):
    Initial sleep time (in seconds) between retries. Default is 1.0.
api_retry_multiplier (float):
    Multiplier for increasing sleep time between retries. Default is 2.0.
timeout (float):
    Request timeout in seconds. Default is None (no timeout).
(...)
"""
```

The increase in the allowed number of tokens is now calculated for all models that are recognized as reasoning models by litellm (as indicated by their `supports_reasoning` function). Instead of having hardcoded upper bounds, we use litellm's `get_max_tokens` helper function, or, if this fails, we query the maximum context length from different endpoints on OpenRouter. If the specified provider is present in that list, we get the information right from OpenRouter. Otherwise, we will choose the minimum context length among all OpenRouter providers to ensure that it works at least with all providers listed there. If this also fails, we will return the default context length of 4096, the same one as currently hardcoded.

In order to use the `suggest_reasoning` function of litellm, I had to update the minimum required version of litellm in the pyproject.toml file to 1.66.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Additional LiteLLM model config options and better context length estimate #966

Issue encountered

Solution/Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Additional LiteLLM model config options and better context length estimate #966

Description

Issue encountered

Solution/Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions