Skip to content

Conversation

@mainred
Copy link
Collaborator

@mainred mainred commented Dec 23, 2025

Holmesgpt tries to calculate the max context and output size from the https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json per the model name, but for azure openai service, the deployment not the model name is used for locating a azure openai service, and the deployment by default has the same with the model name, but it cannot be promised. In case the deployment name does not match model name, homlesgpt will print annoying warning log like below every time there's any llm call. This is not necessary considering we don't change the windows size on time.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Also, we need to suppress the litellm debug info when verbose logging is not enabled.

Provider List: https://docs.litellm.ai/docs/providers

Tests

Before the change, running holmes prints annoying warning log

Using 5 datasources (toolsets). To refresh: use flag `--refresh-toolsets`
NO ENABLED LOGGING TOOLSET
Using selected model: azure/gpt-5.1-qinhao
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.
Using model: azure/gpt-5.1-qinhao (200,000 total tokens, 40,000 output tokens)
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: how may pods are running fine in my cluster?

Thinking...

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

The AI requested 1 tool call(s).
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Running tool #1 fetch_runbook: Runbook: Fetch Runbook networking/dns_troubleshooting_instructions.md
  Finished #1 in 0.00s, output length: 7,933 characters (99 lines) - /show 1 to view contents
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

The AI requested 1 tool call(s).
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Running tool #2 TodoWrite: Update investigation tasks
Task List:
+----+---------------------------------------------+-----------------+
| ID | Content                                     | Status          |
+----+---------------------------------------------+-----------------+
| 1  | Check overall pod status in all namespaces  | [~] in_progress |
| 2  | Count pods in Running phase and ready state | [ ] pending     |
| 3  | Final review against user question          | [ ] pending     |
+----+---------------------------------------------+-----------------+
  Finished #2 in 0.00s, output length: 490 characters (11 lines) - /show 2 to view contents
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.

Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.

Provider List: https://docs.litellm.ai/docs/providers


Provider List: https://docs.litellm.ai/docs/providers

After the change, it prints the warning log that the model name is found once.

 poetry run holmes ask "how may pods are running fine in my cluster?" --model azure/gpt-5.1-qinhao
Loaded models: ['azure/gpt-5.1-qinhao']
Toolset aks-mcp: 'url' field has been migrated to config. Please move 'url' to the config section.
✅ Toolset core_investigation
✅ Toolset internet
✅ Toolset runbook
✅ Toolset aks-mcp
Using 5 datasources (toolsets). To refresh: use flag `--refresh-toolsets`
NO ENABLED LOGGING TOOLSET
Using selected model: azure/gpt-5.1-qinhao
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using default 200000 tokens for max_input_tokens. To override, set OVERRIDE_MAX_CONTENT_SIZE environment variable to
the correct value for your model.
Couldn't find model azure/gpt-5.1-qinhao in litellm's model list (tried: azure/gpt-5.1-qinhao, gpt-5.1-qinhao), using 40000 tokens for max_output_tokens. To override, set OVERRIDE_MAX_OUTPUT_TOKEN environment variable to the
correct value for your model.
Using model: azure/gpt-5.1-qinhao (200,000 total tokens, 40,000 output tokens)
Welcome to HolmesGPT: Type '/exit' to exit, '/help' for commands.
User: how may pods are running fine in my cluster?

Thinking...

The AI requested 1 tool call(s).
Running tool #1 fetch_runbook: Runbook: Fetch Runbook networking/dns_troubleshooting_instructions.md
  Finished #1 in 0.00s, output length: 7,933 characters (99 lines) - /show 1 to view contents

The AI requested 1 tool call(s).
Running tool #2 TodoWrite: Update investigation tasks

Summary by CodeRabbit

  • Refactor

    • LLM configuration values (context window size and maximum output tokens) are now exposed as read-only properties instead of callable methods. Custom LLM implementations should be updated to use the new property-based interface.
  • Tests

    • Test mocks and fixtures updated to use the new property-style LLM interface so tests reflect attribute-based access.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

Walkthrough

This PR refactors the LLM interface: method-based accessors get_context_window_size() and get_maximum_output_token() are replaced with read-only properties context_window_size and maximum_output_token. All internal call sites, example implementations, and tests are updated to use property access; logging and related logic reference the new properties.

Changes

Cohort / File(s) Summary
Core LLM interface
holmes/core/llm.py
Convert abstract methods to @property for context_window_size and maximum_output_token; add backing fields _max_context_size and _max_output_tokens; update DefaultLLM internals and logging to use properties.
Core usage sites
holmes/core/conversations.py, holmes/core/feedback.py, holmes/core/tools_utils/tool_context_window_limiter.py, holmes/core/truncation/input_context_window_limiter.py
Replace calls llm.get_context_window_size() / llm.get_maximum_output_token() with llm.context_window_size / llm.maximum_output_token.
Application surface
holmes/config.py, holmes/interactive.py
Update logging and token-limit computations to read LLM properties instead of invoking getters; minor import reordering and small logging-level changes.
Examples
examples/custom_llm.py
Update example MyCustomLLM to expose context_window_size and maximum_output_token as @property instead of methods.
Logging helper
holmes/utils/console/logging.py
Add litellm.suppress_debug_info = True in suppress_noisy_logs to mute Litellm debug output.
Tests & fixtures
tests/conftest.py, tests/core/test_feedback.py, tests/core/tools_utils/test_tool_context_window_limiter.py, tests/config_class/*, tests/test_approval_workflow.py, tests/test_interactive.py
Mocks and tests updated to set context_window_size and maximum_output_token attributes (or properties) rather than configuring method return values; minor import reorderings in some tests.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • nherment
  • moshemorad

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.08% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective: converting get_context_window_size() and get_maximum_output_token() methods to cached properties to reduce repeated LLM lookups and warnings.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
holmes/core/llm.py (1)

277-304: Major: Potential race condition in caching logic.

If multiple threads access this property before _max_context_size is cached, the lookup and warning log could be executed multiple times, defeating the purpose of caching. Consider using a lock or initializing the cache values during __init__ with a sentinel value to ensure single execution.

🔎 Proposed fix using lock

Add a lock to the class and use it during lookup:

# In __init__:
self._cache_lock = threading.Lock()

# In the property:
@property
def context_window_size(self) -> int:
    if self._max_context_size is not None:
        return self._max_context_size
    
    with self._cache_lock:
        # Double-check after acquiring lock
        if self._max_context_size is not None:
            return self._max_context_size
        
        # ... rest of lookup logic ...
🧹 Nitpick comments (2)
holmes/core/llm.py (2)

290-295: Refactor: Consider moving return to else block.

The static analysis tool suggests restructuring this try-except to improve clarity by moving the successful return to an else block.

🔎 Proposed refactor
     for name in self._get_model_name_variants_for_lookup():
         try:
             self._max_context_size = litellm.model_cost[name]["max_input_tokens"]
-            return self._max_context_size
         except Exception:
             continue
+        else:
+            return self._max_context_size

Based on static analysis hints.


457-468: Refactor: Consider moving return to else block.

Similar to the context_window_size property, the static analysis tool suggests restructuring this try-except for clarity.

🔎 Proposed refactor
     for name in self._get_model_name_variants_for_lookup():
         try:
             litellm_max_output_tokens = litellm.model_cost[name][
                 "max_output_tokens"
             ]
             if litellm_max_output_tokens < max_output_tokens:
                 max_output_tokens = litellm_max_output_tokens

             self._max_output_tokens = max_output_tokens
-            return self._max_output_tokens
         except Exception:
             continue
+        else:
+            return self._max_output_tokens

Based on static analysis hints.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8ec062 and 372f768.

📒 Files selected for processing (17)
  • examples/custom_llm.py
  • holmes/config.py
  • holmes/core/conversations.py
  • holmes/core/feedback.py
  • holmes/core/llm.py
  • holmes/core/tools_utils/tool_context_window_limiter.py
  • holmes/core/truncation/input_context_window_limiter.py
  • holmes/interactive.py
  • holmes/utils/console/logging.py
  • tests/config_class/test_config_api_base_version.py
  • tests/config_class/test_config_load_model_list.py
  • tests/config_class/test_config_load_robusta_ai.py
  • tests/conftest.py
  • tests/core/test_feedback.py
  • tests/core/tools_utils/test_tool_context_window_limiter.py
  • tests/test_approval_workflow.py
  • tests/test_interactive.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/config.py
  • tests/test_approval_workflow.py
  • tests/conftest.py
  • tests/config_class/test_config_load_model_list.py
  • tests/core/tools_utils/test_tool_context_window_limiter.py
  • holmes/core/conversations.py
  • holmes/utils/console/logging.py
  • examples/custom_llm.py
  • tests/config_class/test_config_load_robusta_ai.py
  • holmes/core/truncation/input_context_window_limiter.py
  • tests/test_interactive.py
  • holmes/core/feedback.py
  • tests/config_class/test_config_api_base_version.py
  • tests/core/test_feedback.py
  • holmes/interactive.py
  • holmes/core/llm.py
  • holmes/core/tools_utils/tool_context_window_limiter.py
holmes/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

Load configuration from ~/.holmes/config.yaml or via CLI options in config.py

Files:

  • holmes/config.py
tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Tests: Match source structure under tests/

Files:

  • tests/test_approval_workflow.py
  • tests/conftest.py
  • tests/config_class/test_config_load_model_list.py
  • tests/core/tools_utils/test_tool_context_window_limiter.py
  • tests/config_class/test_config_load_robusta_ai.py
  • tests/test_interactive.py
  • tests/config_class/test_config_api_base_version.py
  • tests/core/test_feedback.py
🧠 Learnings (3)
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

Applied to files:

  • holmes/config.py
  • tests/test_approval_workflow.py
  • tests/core/tools_utils/test_tool_context_window_limiter.py
  • holmes/core/conversations.py
  • examples/custom_llm.py
  • holmes/core/truncation/input_context_window_limiter.py
  • tests/test_interactive.py
  • holmes/core/feedback.py
  • tests/config_class/test_config_api_base_version.py
  • holmes/core/llm.py
  • holmes/core/tools_utils/tool_context_window_limiter.py
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Implement structured tool calling with automatic retry and error handling

Applied to files:

  • tests/core/tools_utils/test_tool_context_window_limiter.py
  • examples/custom_llm.py
  • tests/test_interactive.py
  • holmes/core/tools_utils/tool_context_window_limiter.py
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: holmes/plugins/prompts/{name}.jinja2 file structure

Applied to files:

  • holmes/core/conversations.py
  • examples/custom_llm.py
🧬 Code graph analysis (11)
holmes/config.py (1)
holmes/core/llm.py (4)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/conftest.py (3)
examples/custom_llm.py (2)
  • context_window_size (17-18)
  • maximum_output_token (21-22)
holmes/core/llm.py (4)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/core/test_feedback.py (2)
  • context_window_size (23-24)
  • maximum_output_token (27-28)
tests/config_class/test_config_load_model_list.py (1)
holmes/core/llm.py (4)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/core/tools_utils/test_tool_context_window_limiter.py (4)
examples/custom_llm.py (1)
  • context_window_size (17-18)
holmes/core/llm.py (2)
  • context_window_size (97-98)
  • context_window_size (278-304)
tests/conftest.py (1)
  • context_window_size (98-99)
tests/core/test_feedback.py (1)
  • context_window_size (23-24)
examples/custom_llm.py (1)
holmes/core/llm.py (6)
  • LLM (90-135)
  • TokenCountMetadata (56-63)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/config_class/test_config_load_robusta_ai.py (3)
examples/custom_llm.py (1)
  • context_window_size (17-18)
holmes/core/llm.py (2)
  • context_window_size (97-98)
  • context_window_size (278-304)
tests/conftest.py (1)
  • context_window_size (98-99)
holmes/core/feedback.py (4)
examples/custom_llm.py (1)
  • context_window_size (17-18)
holmes/core/llm.py (2)
  • context_window_size (97-98)
  • context_window_size (278-304)
tests/conftest.py (1)
  • context_window_size (98-99)
tests/core/test_feedback.py (1)
  • context_window_size (23-24)
tests/config_class/test_config_api_base_version.py (4)
examples/custom_llm.py (2)
  • context_window_size (17-18)
  • maximum_output_token (21-22)
holmes/core/llm.py (4)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/conftest.py (2)
  • context_window_size (98-99)
  • maximum_output_token (102-103)
tests/core/test_feedback.py (2)
  • context_window_size (23-24)
  • maximum_output_token (27-28)
holmes/interactive.py (4)
examples/custom_llm.py (2)
  • context_window_size (17-18)
  • maximum_output_token (21-22)
holmes/core/llm.py (4)
  • context_window_size (97-98)
  • context_window_size (278-304)
  • maximum_output_token (102-103)
  • maximum_output_token (446-477)
tests/conftest.py (2)
  • context_window_size (98-99)
  • maximum_output_token (102-103)
tests/core/test_feedback.py (2)
  • context_window_size (23-24)
  • maximum_output_token (27-28)
holmes/core/llm.py (3)
examples/custom_llm.py (2)
  • context_window_size (17-18)
  • maximum_output_token (21-22)
tests/conftest.py (2)
  • context_window_size (98-99)
  • maximum_output_token (102-103)
tests/core/test_feedback.py (2)
  • context_window_size (23-24)
  • maximum_output_token (27-28)
holmes/core/tools_utils/tool_context_window_limiter.py (5)
holmes/core/llm.py (2)
  • context_window_size (97-98)
  • context_window_size (278-304)
holmes/core/tools.py (1)
  • StructuredToolResultStatus (52-76)
examples/custom_llm.py (1)
  • context_window_size (17-18)
tests/conftest.py (1)
  • context_window_size (98-99)
tests/core/test_feedback.py (1)
  • context_window_size (23-24)
🪛 Ruff (0.14.10)
holmes/core/llm.py

293-293: Consider moving this statement to an else block

(TRY300)


466-466: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: llm_evals
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build
🔇 Additional comments (20)
holmes/core/feedback.py (1)

29-31: LGTM!

The migration from method call to property access is correct and aligns with the broader refactoring to cache LLM configuration values.

holmes/config.py (1)

517-519: LGTM!

The log message correctly uses the property-based access pattern for LLM configuration values, consistent with the broader refactoring to cache these values and reduce repeated lookups.

tests/test_interactive.py (1)

296-301: LGTM!

The test setup correctly updates the mock to use direct attribute assignment for context_window_size, consistent with the migration to property-based access across the LLM interface.

holmes/core/tools_utils/tool_context_window_limiter.py (1)

14-15: LGTM!

The change from method call to property access is correct and aligns with the project-wide refactoring to use cached LLM configuration values.

tests/config_class/test_config_load_robusta_ai.py (1)

143-150: LGTM!

The test correctly verifies the context window size using property access, consistent with the broader migration to property-based LLM configuration.

tests/config_class/test_config_load_model_list.py (1)

57-62: LGTM!

The mock setup correctly uses direct attribute assignment for the property-based LLM interface, ensuring tests align with the refactored API.

tests/test_approval_workflow.py (3)

84-102: LGTM!

The mock setup correctly uses direct attribute assignment for context_window_size and maximum_output_token, consistent with the property-based LLM interface.


218-236: LGTM!

The mock setup correctly uses direct attribute assignment for the property-based LLM configuration values.


365-383: LGTM!

The mock setup correctly uses direct attribute assignment, consistent with the refactored property-based LLM interface.

holmes/core/truncation/input_context_window_limiter.py (1)

153-154: LGTM! Property access correctly replaces method calls.

The change from llm.get_context_window_size() and llm.get_maximum_output_token() to property access (llm.context_window_size and llm.maximum_output_token) is correct and aligns with the broader refactor. This enables the caching behavior described in the PR objectives, where the underlying properties in holmes/core/llm.py cache the resolved values to avoid repeated litellm lookups and warnings.

tests/config_class/test_config_api_base_version.py (1)

48-49: LGTM! Test mocks correctly updated for property-based interface.

The test updates consistently replace mock_llm_instance.get_context_window_size.return_value with direct attribute assignment mock_llm_instance.context_window_size = 128000, which is the correct approach for mocking properties. All test cases maintain their original assertions and logic while adapting to the new interface.

Also applies to: 77-78, 114-115, 152-153, 190-191, 241-242, 300-301, 340-341, 380-381, 421-422, 462-463

holmes/interactive.py (1)

496-497: LGTM! Property access correctly replaces method calls in interactive context display.

The changes from ai.llm.get_context_window_size() and ai.llm.get_maximum_output_token() to property access are correct. These values are used to calculate and display conversation context statistics in the /context command, and the property-based access will benefit from the caching implemented in the underlying LLM class.

holmes/core/conversations.py (1)

27-30: LGTM! Property access correctly replaces method calls in tool size calculation.

The conversion to property access (ai.llm.context_window_size and ai.llm.maximum_output_token) in the calculate_tool_size function is correct. Since this function is called multiple times during conversation building, the caching behavior of these properties will improve performance by avoiding repeated litellm lookups.

examples/custom_llm.py (1)

16-21: LGTM! Example correctly demonstrates property-based LLM interface.

The conversion from methods to properties using @property decorators is correct and provides a clear example for users implementing custom LLM classes. The return values remain unchanged (128000 for context window, 4096 for max output tokens), ensuring the example continues to work as expected.

tests/conftest.py (1)

97-102: LGTM! MockLLM correctly updated to property-based interface.

The conversion of get_context_window_size and get_maximum_output_token from methods to properties is correct and consistent with the LLM interface changes. This ensures that all tests using MockLLM will work with the new property-based access pattern.

tests/core/tools_utils/test_tool_context_window_limiter.py (1)

18-18: LGTM! Test mocks correctly updated for property-based LLM interface.

The changes from setting mock_llm.get_context_window_size.return_value to directly assigning mock_llm.context_window_size are correct for testing property access. The test logic and assertions remain unchanged, ensuring continued validation of the tool context window limiting functionality.

Also applies to: 119-119, 172-172, 201-201, 228-228

tests/core/test_feedback.py (1)

22-27: LGTM! MockLLM properties correctly implemented for feedback tests.

The conversion from methods to properties is correct. The context_window_size property returns self._context_size (allowing tests to configure different sizes), and maximum_output_token returns a fixed value of 1024. This implementation is appropriate for testing feedback functionality.

holmes/core/llm.py (3)

95-103: LGTM: Conversion to properties is appropriate.

Converting these from methods to properties is a good design choice for read-only configuration values.


10-10: LGTM: Imports properly placed.

The new imports are correctly placed at the top of the file, consistent with coding guidelines.

Also applies to: 22-22


110-110: LGTM: Property access syntax correctly applied.

The call site has been properly updated to use property access instead of method invocation.

@github-actions
Copy link
Contributor

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
holmes/core/llm.py (2)

171-176: Critical: Initialize instance variables in __init__ before update_custom_args() is called.

The instance variables _max_context_size and _max_output_tokens are conditionally assigned here from custom_args. If these keys don't exist in custom_args, the attributes will never be created, causing an AttributeError when the properties at lines 279 and 447 attempt to access them.

🔎 Proposed fix

Initialize both attributes to None in __init__ after line 164:

     self.name = name
     self.is_robusta_model = is_robusta_model
+    self._max_context_size = None
+    self._max_output_tokens = None
     self.update_custom_args()

447-448: Critical: Fix truthiness check for cached value.

The check if self._max_output_tokens: will incorrectly treat 0 as False and skip returning a cached value of 0. This should explicitly check for None to match the pattern used in context_window_size at line 279.

🔎 Proposed fix
-    if self._max_output_tokens:
+    if self._max_output_tokens is not None:
         return self._max_output_tokens
🧹 Nitpick comments (1)
holmes/core/llm.py (1)

290-294: Optional: Consider moving return statements to else blocks.

Ruff suggests restructuring the exception handling at lines 293 and 466 to make the control flow more explicit by moving the cache-and-return logic into else blocks.

🔎 Example refactor for lines 290-294
         for name in self._get_model_name_variants_for_lookup():
             try:
                 self._max_context_size = litellm.model_cost[name]["max_input_tokens"]
-                return self._max_context_size
             except Exception:
                 continue
+            else:
+                return self._max_context_size

Apply similar pattern at lines 456-467 for maximum_output_token.

Also applies to: 456-467

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 372f768 and 51699dc.

📒 Files selected for processing (1)
  • holmes/core/llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

  • holmes/core/llm.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration

Applied to files:

  • holmes/core/llm.py
📚 Learning: 2025-12-23T13:33:24.074Z
Learnt from: mainred
Repo: HolmesGPT/holmesgpt PR: 1231
File: holmes/utils/console/logging.py:42-44
Timestamp: 2025-12-23T13:33:24.074Z
Learning: In holmes/utils/console/logging.py, the `import litellm` statement is intentionally placed inline within the `suppress_noisy_logs()` function (not at the module top) to prevent unexpected log output during module initialization. The litellm library can produce debug logs at import time, so importing it inline allows `suppress_debug_info = True` to be set immediately after the import, before any logs are emitted.

Applied to files:

  • holmes/core/llm.py
🧬 Code graph analysis (1)
holmes/core/llm.py (3)
examples/custom_llm.py (2)
  • context_window_size (17-18)
  • maximum_output_token (21-22)
tests/conftest.py (2)
  • context_window_size (98-99)
  • maximum_output_token (102-103)
tests/core/test_feedback.py (2)
  • context_window_size (23-24)
  • maximum_output_token (27-28)
🪛 Ruff (0.14.10)
holmes/core/llm.py

293-293: Consider moving this statement to an else block

(TRY300)


466-466: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: llm_evals
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build
🔇 Additional comments (2)
holmes/core/llm.py (2)

277-304: Good: Correct None check and caching implementation.

The property correctly uses if self._max_context_size is not None: to check for cached values, which properly handles 0 as a valid cached value. The caching logic ensures the warning at line 298 will only appear once per instance, addressing the PR objective of avoiding repeated warnings.


95-103: Approve the caching approach for context window and output token limits.

The implementation successfully addresses the PR objectives by:

  • Converting methods to properties for cleaner access patterns
  • Caching lookup results to avoid repeated litellm model_cost lookups
  • Ensuring warning messages (lines 298, 471) only appear once per instance

Once the critical initialization and truthiness check issues are addressed, this will effectively eliminate the repeated warnings described in the PR objectives.

Also applies to: 277-304, 445-477

@github-actions
Copy link
Contributor

Results of HolmesGPT evals

  • ask_holmes: 29/37 test cases were successful, 3 regressions, 2 setup failures, 3 mock failures
Test suite Test case Status
ask 01_how_many_pods
ask 02_what_is_wrong_with_pod
ask 04_related_k8s_events
ask 05_image_version
ask 09_crashpod
ask 10_image_pull_backoff
ask 110_k8s_events_image_pull
ask 11_init_containers
ask 13a_pending_node_selector_basic
ask 14_pending_resources
ask 15_failed_readiness_probe
ask 163_compaction_follow_up
ask 17_oom_kill 🚧
ask 18_oom_kill_from_issues_history
ask 19_detect_missing_app_details
ask 20_long_log_file_search
ask 24_misconfigured_pvc
ask 24a_misconfigured_pvc_basic
ask 28_permissions_error 🚧
ask 39_failed_toolset
ask 41_setup_argo
ask 42_dns_issues_steps_new_tools
ask 43_current_datetime_from_prompt
ask 45_fetch_deployment_logs_simple
ask 51_logs_summarize_errors
ask 53_logs_find_term
ask 54_not_truncated_when_getting_pods
ask 59_label_based_counting
ask 60_count_less_than
ask 61_exact_match_counting
ask 63_fetch_error_logs_no_errors
ask 79_configmap_mount_issue
ask 83_secret_not_found
ask 86_configmap_like_but_secret
ask 93_calling_datadog[0] 🔧
ask 93_calling_datadog[1] 🔧
ask 93_calling_datadog[2] 🔧

Legend

  • ✅ the test was successful
  • :minus: the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • 🚫 the test was throttled by API rate limits/overload
  • ❌ the test failed and should be fixed before merging the PR

@github-actions
Copy link
Contributor

@mainred
Copy link
Collaborator Author

mainred commented Dec 24, 2025

@moshemorad @arikalon1 May I have a review of this PR, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants