fix: cache llm windows size #1231

mainred · 2025-12-23T13:23:38Z

Holmesgpt tries to calculate the max context and output size from the https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json per the model name, but for azure openai service, the deployment not the model name is used for locating a azure openai service, and the deployment by default has the same with the model name, but it cannot be promised. In case the deployment name does not match model name, homlesgpt will print annoying warning log like below every time there's any llm call. This is not necessary considering we don't change the windows size on time.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Also, we need to suppress the litellm debug info when verbose logging is not enabled.

Provider List: https://docs.litellm.ai/docs/providers

Tests

Before the change, running holmes prints annoying warning log

Using 5 datasources (toolsets). To refresh: use flag `--refresh-toolsets`
NO ENABLED LOGGING TOOLSET
Using selected model: azure/gpt-5.1-qinhao
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.
Using model: azure/gpt-5.1-qinhao (200,000 total tokens, 40,000 output tokens)
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: how may pods are running fine in my cluster?

Thinking...

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

The AI requested 1 tool call(s).
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Running tool #1 fetch_runbook: Runbook: Fetch Runbook networking/dns_troubleshooting_instructions.md
  Finished #1 in 0.00s, output length: 7,933 characters (99 lines) - /show 1 to view contents
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

The AI requested 1 tool call(s).
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Running tool #2 TodoWrite: Update investigation tasks
Task List:
+----+---------------------------------------------+-----------------+
| ID | Content                                     | Status          |
+----+---------------------------------------------+-----------------+
| 1  | Check overall pod status in all namespaces  | [~] in_progress |
| 2  | Count pods in Running phase and ready state | [ ] pending     |
| 3  | Final review against user question          | [ ] pending     |
+----+---------------------------------------------+-----------------+
  Finished #2 in 0.00s, output length: 490 characters (11 lines) - /show 2 to view contents
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

After the change, it prints the warning log that the model name is found once.

 poetry run holmes ask "how may pods are running fine in my cluster?" --model azure/gpt-5.1-qinhao
Loaded models: ['azure/gpt-5.1-qinhao']
Toolset aks-mcp: 'url' field has been migrated to config. Please move 'url' to the config section.
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset runbook
✅ Toolset aks-mcp
Using 5 datasources (toolsets). To refresh: use flag `--refresh-toolsets`
NO ENABLED LOGGING TOOLSET
Using selected model: azure/gpt-5.1-qinhao
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.
Using model: azure/gpt-5.1-qinhao (200,000 total tokens, 40,000 output tokens)
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: how may pods are running fine in my cluster?

Thinking...

The AI requested 1 tool call(s).
Running tool #1 fetch_runbook: Runbook: Fetch Runbook networking/dns_troubleshooting_instructions.md
  Finished #1 in 0.00s, output length: 7,933 characters (99 lines) - /show 1 to view contents

The AI requested 1 tool call(s).
Running tool #2 TodoWrite: Update investigation tasks

Summary by CodeRabbit

Refactor
- LLM configuration values (context window size and maximum output tokens) are now exposed as read-only properties instead of callable methods. Custom LLM implementations should be updated to use the new property-based interface.
Tests
- Test mocks and fixtures updated to use the new property-style LLM interface so tests reflect attribute-based access.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-23T13:23:52Z

Walkthrough

This PR refactors the LLM interface: method-based accessors get_context_window_size() and get_maximum_output_token() are replaced with read-only properties context_window_size and maximum_output_token. All internal call sites, example implementations, and tests are updated to use property access; logging and related logic reference the new properties.

Changes

Cohort / File(s)	Summary
Core LLM interface `holmes/core/llm.py`	Convert abstract methods to `@property` for `context_window_size` and `maximum_output_token`; add backing fields `_max_context_size` and `_max_output_tokens`; update DefaultLLM internals and logging to use properties.
Core usage sites `holmes/core/conversations.py`, `holmes/core/feedback.py`, `holmes/core/tools_utils/tool_context_window_limiter.py`, `holmes/core/truncation/input_context_window_limiter.py`	Replace calls `llm.get_context_window_size()` / `llm.get_maximum_output_token()` with `llm.context_window_size` / `llm.maximum_output_token`.
Application surface `holmes/config.py`, `holmes/interactive.py`	Update logging and token-limit computations to read LLM properties instead of invoking getters; minor import reordering and small logging-level changes.
Examples `examples/custom_llm.py`	Update example `MyCustomLLM` to expose `context_window_size` and `maximum_output_token` as `@property` instead of methods.
Logging helper `holmes/utils/console/logging.py`	Add `litellm.suppress_debug_info = True` in suppress_noisy_logs to mute Litellm debug output.
Tests & fixtures `tests/conftest.py`, `tests/core/test_feedback.py`, `tests/core/tools_utils/test_tool_context_window_limiter.py`, `tests/config_class/*`, `tests/test_approval_workflow.py`, `tests/test_interactive.py`	Mocks and tests updated to set `context_window_size` and `maximum_output_token` attributes (or properties) rather than configuring method return values; minor import reorderings in some tests.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

log details on context window #1047: Related changes to holmes/config.py logging of LLM token limits (method → attribute access).
ROB-2125 conversation history compaction #1014: Related LLM interface and token-count handling changes touching holmes/core/llm.py and token fields.

Suggested reviewers

nherment
moshemorad

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.08% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main objective: converting get_context_window_size() and get_maximum_output_token() methods to cached properties to reduce repeated LLM lookups and warnings.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

holmes/core/llm.py (1)
277-304: Major: Potential race condition in caching logic.

If multiple threads access this property before _max_context_size is cached, the lookup and warning log could be executed multiple times, defeating the purpose of caching. Consider using a lock or initializing the cache values during __init__ with a sentinel value to ensure single execution.
🔎 Proposed fix using lock

Add a lock to the class and use it during lookup:
# In __init__:
self._cache_lock = threading.Lock()

# In the property:
@property
def context_window_size(self) -> int:
    if self._max_context_size is not None:
        return self._max_context_size
    
    with self._cache_lock:
        # Double-check after acquiring lock
        if self._max_context_size is not None:
            return self._max_context_size
        
        # ... rest of lookup logic ...

🧹 Nitpick comments (2)

holmes/core/llm.py (2)
290-295: Refactor: Consider moving return to else block.

The static analysis tool suggests restructuring this try-except to improve clarity by moving the successful return to an else block.
🔎 Proposed refactor
     for name in self._get_model_name_variants_for_lookup():
         try:
             self._max_context_size = litellm.model_cost[name]["max_input_tokens"]
-            return self._max_context_size
         except Exception:
             continue
+        else:
+            return self._max_context_size
Based on static analysis hints.

457-468: Refactor: Consider moving return to else block.

Similar to the context_window_size property, the static analysis tool suggests restructuring this try-except for clarity.
🔎 Proposed refactor
     for name in self._get_model_name_variants_for_lookup():
         try:
             litellm_max_output_tokens = litellm.model_cost[name][
                 "max_output_tokens"
             ]
             if litellm_max_output_tokens < max_output_tokens:
                 max_output_tokens = litellm_max_output_tokens

             self._max_output_tokens = max_output_tokens
-            return self._max_output_tokens
         except Exception:
             continue
+        else:
+            return self._max_output_tokens
Based on static analysis hints.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8ec062 and 372f768.

📒 Files selected for processing (17)

examples/custom_llm.py
holmes/config.py
holmes/core/conversations.py
holmes/core/feedback.py
holmes/core/llm.py
holmes/core/tools_utils/tool_context_window_limiter.py
holmes/core/truncation/input_context_window_limiter.py
holmes/interactive.py
holmes/utils/console/logging.py
tests/config_class/test_config_api_base_version.py
tests/config_class/test_config_load_model_list.py
tests/config_class/test_config_load_robusta_ai.py
tests/conftest.py
tests/core/test_feedback.py
tests/core/tools_utils/test_tool_context_window_limiter.py
tests/test_approval_workflow.py
tests/test_interactive.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

holmes/config.py
tests/test_approval_workflow.py
tests/conftest.py
tests/config_class/test_config_load_model_list.py
tests/core/tools_utils/test_tool_context_window_limiter.py
holmes/core/conversations.py
holmes/utils/console/logging.py
examples/custom_llm.py
tests/config_class/test_config_load_robusta_ai.py
holmes/core/truncation/input_context_window_limiter.py
tests/test_interactive.py
holmes/core/feedback.py
tests/config_class/test_config_api_base_version.py
tests/core/test_feedback.py
holmes/interactive.py
holmes/core/llm.py
holmes/core/tools_utils/tool_context_window_limiter.py

holmes/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

Load configuration from ~/.holmes/config.yaml or via CLI options in config.py

Files:

holmes/config.py

tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests: Match source structure under tests/

Files:

tests/test_approval_workflow.py
tests/conftest.py
tests/config_class/test_config_load_model_list.py
tests/core/tools_utils/test_tool_context_window_limiter.py
tests/config_class/test_config_load_robusta_ai.py
tests/test_interactive.py
tests/config_class/test_config_api_base_version.py
tests/core/test_feedback.py

🧠 Learnings (3)

📚 Learning: 2025-12-21T13:17:48.366Z

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

Applied to files:

holmes/config.py
tests/test_approval_workflow.py
tests/core/tools_utils/test_tool_context_window_limiter.py
holmes/core/conversations.py
examples/custom_llm.py
holmes/core/truncation/input_context_window_limiter.py
tests/test_interactive.py
holmes/core/feedback.py
tests/config_class/test_config_api_base_version.py
holmes/core/llm.py
holmes/core/tools_utils/tool_context_window_limiter.py

📚 Learning: 2025-12-21T13:17:48.366Z

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Implement structured tool calling with automatic retry and error handling

Applied to files:

tests/core/tools_utils/test_tool_context_window_limiter.py
examples/custom_llm.py
tests/test_interactive.py
holmes/core/tools_utils/tool_context_window_limiter.py

📚 Learning: 2025-12-21T13:17:48.366Z

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: holmes/plugins/prompts/{name}.jinja2 file structure

Applied to files:

holmes/core/conversations.py
examples/custom_llm.py

🧬 Code graph analysis (11)

holmes/config.py (1)

holmes/core/llm.py (4)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/conftest.py (3)

examples/custom_llm.py (2)

context_window_size (17-18)

maximum_output_token (21-22)

holmes/core/llm.py (4)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/core/test_feedback.py (2)

context_window_size (23-24)

maximum_output_token (27-28)

tests/config_class/test_config_load_model_list.py (1)

holmes/core/llm.py (4)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/core/tools_utils/test_tool_context_window_limiter.py (4)

examples/custom_llm.py (1)

context_window_size (17-18)

holmes/core/llm.py (2)

context_window_size (97-98)

context_window_size (278-304)

tests/conftest.py (1)

context_window_size (98-99)

tests/core/test_feedback.py (1)

context_window_size (23-24)

examples/custom_llm.py (1)

holmes/core/llm.py (6)

LLM (90-135)

TokenCountMetadata (56-63)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/config_class/test_config_load_robusta_ai.py (3)

examples/custom_llm.py (1)

context_window_size (17-18)

holmes/core/llm.py (2)

context_window_size (97-98)

context_window_size (278-304)

tests/conftest.py (1)

context_window_size (98-99)

holmes/core/feedback.py (4)

examples/custom_llm.py (1)

context_window_size (17-18)

holmes/core/llm.py (2)

context_window_size (97-98)

context_window_size (278-304)

tests/conftest.py (1)

context_window_size (98-99)

tests/core/test_feedback.py (1)

context_window_size (23-24)

tests/config_class/test_config_api_base_version.py (4)

examples/custom_llm.py (2)

context_window_size (17-18)

maximum_output_token (21-22)

holmes/core/llm.py (4)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/conftest.py (2)

context_window_size (98-99)

maximum_output_token (102-103)

tests/core/test_feedback.py (2)

context_window_size (23-24)

maximum_output_token (27-28)

holmes/interactive.py (4)

examples/custom_llm.py (2)

context_window_size (17-18)

maximum_output_token (21-22)

holmes/core/llm.py (4)

context_window_size (97-98)

context_window_size (278-304)

maximum_output_token (102-103)

maximum_output_token (446-477)

tests/conftest.py (2)

context_window_size (98-99)

maximum_output_token (102-103)

tests/core/test_feedback.py (2)

context_window_size (23-24)

maximum_output_token (27-28)

holmes/core/llm.py (3)

examples/custom_llm.py (2)

context_window_size (17-18)

maximum_output_token (21-22)

tests/conftest.py (2)

context_window_size (98-99)

maximum_output_token (102-103)

tests/core/test_feedback.py (2)

context_window_size (23-24)

maximum_output_token (27-28)

holmes/core/tools_utils/tool_context_window_limiter.py (5)

holmes/core/llm.py (2)

context_window_size (97-98)

context_window_size (278-304)

holmes/core/tools.py (1)

StructuredToolResultStatus (52-76)

examples/custom_llm.py (1)

context_window_size (17-18)

tests/conftest.py (1)

context_window_size (98-99)

tests/core/test_feedback.py (1)

context_window_size (23-24)

🪛 Ruff (0.14.10)

holmes/core/llm.py

293-293: Consider moving this statement to an else block

(TRY300)

466-466: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: llm_evals
GitHub Check: build (3.10)
GitHub Check: build (3.11)
GitHub Check: build (3.12)
GitHub Check: build

🔇 Additional comments (20)

holmes/core/feedback.py (1)

29-31: LGTM!

The migration from method call to property access is correct and aligns with the broader refactoring to cache LLM configuration values.

holmes/config.py (1)

517-519: LGTM!

The log message correctly uses the property-based access pattern for LLM configuration values, consistent with the broader refactoring to cache these values and reduce repeated lookups.

tests/test_interactive.py (1)

296-301: LGTM!

The test setup correctly updates the mock to use direct attribute assignment for context_window_size, consistent with the migration to property-based access across the LLM interface.

holmes/core/tools_utils/tool_context_window_limiter.py (1)

14-15: LGTM!

The change from method call to property access is correct and aligns with the project-wide refactoring to use cached LLM configuration values.

tests/config_class/test_config_load_robusta_ai.py (1)

143-150: LGTM!

The test correctly verifies the context window size using property access, consistent with the broader migration to property-based LLM configuration.

tests/config_class/test_config_load_model_list.py (1)

57-62: LGTM!

The mock setup correctly uses direct attribute assignment for the property-based LLM interface, ensuring tests align with the refactored API.

tests/test_approval_workflow.py (3)

84-102: LGTM!

The mock setup correctly uses direct attribute assignment for context_window_size and maximum_output_token, consistent with the property-based LLM interface.

218-236: LGTM!

The mock setup correctly uses direct attribute assignment for the property-based LLM configuration values.

365-383: LGTM!

The mock setup correctly uses direct attribute assignment, consistent with the refactored property-based LLM interface.

holmes/core/truncation/input_context_window_limiter.py (1)

153-154: LGTM! Property access correctly replaces method calls.

The change from llm.get_context_window_size() and llm.get_maximum_output_token() to property access (llm.context_window_size and llm.maximum_output_token) is correct and aligns with the broader refactor. This enables the caching behavior described in the PR objectives, where the underlying properties in holmes/core/llm.py cache the resolved values to avoid repeated litellm lookups and warnings.

tests/config_class/test_config_api_base_version.py (1)

48-49: LGTM! Test mocks correctly updated for property-based interface.

The test updates consistently replace mock_llm_instance.get_context_window_size.return_value with direct attribute assignment mock_llm_instance.context_window_size = 128000, which is the correct approach for mocking properties. All test cases maintain their original assertions and logic while adapting to the new interface.

Also applies to: 77-78, 114-115, 152-153, 190-191, 241-242, 300-301, 340-341, 380-381, 421-422, 462-463

holmes/interactive.py (1)

496-497: LGTM! Property access correctly replaces method calls in interactive context display.

The changes from ai.llm.get_context_window_size() and ai.llm.get_maximum_output_token() to property access are correct. These values are used to calculate and display conversation context statistics in the /context command, and the property-based access will benefit from the caching implemented in the underlying LLM class.

holmes/core/conversations.py (1)

27-30: LGTM! Property access correctly replaces method calls in tool size calculation.

The conversion to property access (ai.llm.context_window_size and ai.llm.maximum_output_token) in the calculate_tool_size function is correct. Since this function is called multiple times during conversation building, the caching behavior of these properties will improve performance by avoiding repeated litellm lookups.

examples/custom_llm.py (1)

16-21: LGTM! Example correctly demonstrates property-based LLM interface.

The conversion from methods to properties using @property decorators is correct and provides a clear example for users implementing custom LLM classes. The return values remain unchanged (128000 for context window, 4096 for max output tokens), ensuring the example continues to work as expected.

tests/conftest.py (1)

97-102: LGTM! MockLLM correctly updated to property-based interface.

The conversion of get_context_window_size and get_maximum_output_token from methods to properties is correct and consistent with the LLM interface changes. This ensures that all tests using MockLLM will work with the new property-based access pattern.

tests/core/tools_utils/test_tool_context_window_limiter.py (1)

18-18: LGTM! Test mocks correctly updated for property-based LLM interface.

The changes from setting mock_llm.get_context_window_size.return_value to directly assigning mock_llm.context_window_size are correct for testing property access. The test logic and assertions remain unchanged, ensuring continued validation of the tool context window limiting functionality.

Also applies to: 119-119, 172-172, 201-201, 228-228

tests/core/test_feedback.py (1)

22-27: LGTM! MockLLM properties correctly implemented for feedback tests.

The conversion from methods to properties is correct. The context_window_size property returns self._context_size (allowing tests to configure different sizes), and maximum_output_token returns a fixed value of 1024. This implementation is appropriate for testing feedback functionality.

holmes/core/llm.py (3)

95-103: LGTM: Conversion to properties is appropriate.

Converting these from methods to properties is a good design choice for read-only configuration values.

10-10: LGTM: Imports properly placed.

The new imports are correctly placed at the top of the file, consistent with coding guidelines.

Also applies to: 22-22

110-110: LGTM: Property access syntax correctly applied.

The call site has been properly updated to use property access instead of method invocation.

holmes/core/llm.py

holmes/utils/console/logging.py

github-actions · 2025-12-23T13:35:52Z

Dev Docker images are ready for this commit:

Use either tag to pull the image for testing.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

holmes/core/llm.py (2)
171-176: Critical: Initialize instance variables in __init__ before update_custom_args() is called.

The instance variables _max_context_size and _max_output_tokens are conditionally assigned here from custom_args. If these keys don't exist in custom_args, the attributes will never be created, causing an AttributeError when the properties at lines 279 and 447 attempt to access them.
🔎 Proposed fix

Initialize both attributes to None in __init__ after line 164:
     self.name = name
     self.is_robusta_model = is_robusta_model
+    self._max_context_size = None
+    self._max_output_tokens = None
     self.update_custom_args()
447-448: Critical: Fix truthiness check for cached value.

The check if self._max_output_tokens: will incorrectly treat 0 as False and skip returning a cached value of 0. This should explicitly check for None to match the pattern used in context_window_size at line 279.
🔎 Proposed fix
-    if self._max_output_tokens:
+    if self._max_output_tokens is not None:
         return self._max_output_tokens

🧹 Nitpick comments (1)

holmes/core/llm.py (1)
290-294: Optional: Consider moving return statements to else blocks.

Ruff suggests restructuring the exception handling at lines 293 and 466 to make the control flow more explicit by moving the cache-and-return logic into else blocks.
🔎 Example refactor for lines 290-294
         for name in self._get_model_name_variants_for_lookup():
             try:
                 self._max_context_size = litellm.model_cost[name]["max_input_tokens"]
-                return self._max_context_size
             except Exception:
                 continue
+            else:
+                return self._max_context_size
Apply similar pattern at lines 456-467 for maximum_output_token.
Also applies to: 456-467

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 372f768 and 51699dc.

📒 Files selected for processing (1)

holmes/core/llm.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

holmes/core/llm.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

📚 Learning: 2025-12-21T13:17:48.366Z

Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

Applied to files:

holmes/core/llm.py

📚 Learning: 2025-12-23T13:33:24.074Z

Learnt from: mainred
Repo: HolmesGPT/holmesgpt PR: 1231
File: holmes/utils/console/logging.py:42-44
Timestamp: 2025-12-23T13:33:24.074Z
Learning: In holmes/utils/console/logging.py, the `import litellm` statement is intentionally placed inline within the `suppress_noisy_logs()` function (not at the module top) to prevent unexpected log output during module initialization. The litellm library can produce debug logs at import time, so importing it inline allows `suppress_debug_info = True` to be set immediately after the import, before any logs are emitted.

Applied to files:

holmes/core/llm.py

🧬 Code graph analysis (1)

holmes/core/llm.py (3)

examples/custom_llm.py (2)

context_window_size (17-18)

maximum_output_token (21-22)

tests/conftest.py (2)

context_window_size (98-99)

maximum_output_token (102-103)

tests/core/test_feedback.py (2)

context_window_size (23-24)

maximum_output_token (27-28)

🪛 Ruff (0.14.10)

holmes/core/llm.py

293-293: Consider moving this statement to an else block

(TRY300)

466-466: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: llm_evals
GitHub Check: build (3.10)
GitHub Check: build (3.11)
GitHub Check: build (3.12)
GitHub Check: build

🔇 Additional comments (2)

holmes/core/llm.py (2)

277-304: Good: Correct None check and caching implementation.

The property correctly uses if self._max_context_size is not None: to check for cached values, which properly handles 0 as a valid cached value. The caching logic ensures the warning at line 298 will only appear once per instance, addressing the PR objective of avoiding repeated warnings.

95-103: Approve the caching approach for context window and output token limits.

The implementation successfully addresses the PR objectives by:

Converting methods to properties for cleaner access patterns

Caching lookup results to avoid repeated litellm model_cost lookups

Ensuring warning messages (lines 298, 471) only appear once per instance

Once the critical initialization and truthiness check issues are addressed, this will effectively eliminate the repeated warnings described in the PR objectives.

Also applies to: 277-304, 445-477

github-actions · 2025-12-23T13:48:08Z

Results of HolmesGPT evals

ask_holmes: 29/37 test cases were successful, 3 regressions, 2 setup failures, 3 mock failures

Test suite	Test case	Status
ask	01_how_many_pods	✅
ask	02_what_is_wrong_with_pod	✅
ask	04_related_k8s_events	✅
ask	05_image_version	✅
ask	09_crashpod	✅
ask	10_image_pull_backoff	✅
ask	110_k8s_events_image_pull	✅
ask	11_init_containers	✅
ask	13a_pending_node_selector_basic	✅
ask	14_pending_resources	✅
ask	15_failed_readiness_probe	✅
ask	163_compaction_follow_up	✅
ask	17_oom_kill	🚧
ask	18_oom_kill_from_issues_history	❌
ask	19_detect_missing_app_details	✅
ask	20_long_log_file_search	❌
ask	24_misconfigured_pvc	✅
ask	24a_misconfigured_pvc_basic	✅
ask	28_permissions_error	🚧
ask	39_failed_toolset	✅
ask	41_setup_argo	✅
ask	42_dns_issues_steps_new_tools	✅
ask	43_current_datetime_from_prompt	✅
ask	45_fetch_deployment_logs_simple	✅
ask	51_logs_summarize_errors	✅
ask	53_logs_find_term	✅
ask	54_not_truncated_when_getting_pods	❌
ask	59_label_based_counting	✅
ask	60_count_less_than	✅
ask	61_exact_match_counting	✅
ask	63_fetch_error_logs_no_errors	✅
ask	79_configmap_mount_issue	✅
ask	83_secret_not_found	✅
ask	86_configmap_like_but_secret	✅
ask	93_calling_datadog[0]	🔧
ask	93_calling_datadog[1]	🔧
ask	93_calling_datadog[2]	🔧

Legend

✅ the test was successful
:minus: the test was skipped
⚠️ the test failed but is known to be flaky or known to fail
🚧 the test had a setup failure (not a code regression)
🔧 the test failed due to mock data issues (not a code regression)
🚫 the test was throttled by API rate limits/overload
❌ the test failed and should be fixed before merging the PR

github-actions · 2025-12-23T13:48:15Z

Dev Docker images are ready for this commit:

Use either tag to pull the image for testing.

mainred · 2025-12-24T14:46:27Z

@moshemorad @arikalon1 May I have a review of this PR, thanks.

mainred added 2 commits December 23, 2025 13:10

suppress litellm debug info for non-verbose log

59f5d6c

cache windows context and output size

372f768

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

holmes/core/llm.py Outdated Show resolved Hide resolved

holmes/core/llm.py Outdated Show resolved Hide resolved

holmes/core/llm.py Show resolved Hide resolved

holmes/utils/console/logging.py Show resolved Hide resolved

address ai comments

51699dc

coderabbitai bot reviewed Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: cache llm windows size #1231

fix: cache llm windows size #1231

Uh oh!

mainred commented Dec 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 23, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

mainred commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: cache llm windows size #1231

Are you sure you want to change the base?

fix: cache llm windows size #1231

Uh oh!

Conversation

mainred commented Dec 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 23, 2025

Results of HolmesGPT evals

Uh oh!

github-actions bot commented Dec 23, 2025

Uh oh!

mainred commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mainred commented Dec 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 23, 2025 •

edited

Loading