-
Notifications
You must be signed in to change notification settings - Fork 215
fix: cache llm windows size #1231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThis PR refactors the LLM interface: method-based accessors Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
holmes/core/llm.py (1)
277-304: Major: Potential race condition in caching logic.If multiple threads access this property before
_max_context_sizeis cached, the lookup and warning log could be executed multiple times, defeating the purpose of caching. Consider using a lock or initializing the cache values during__init__with a sentinel value to ensure single execution.🔎 Proposed fix using lock
Add a lock to the class and use it during lookup:
# In __init__: self._cache_lock = threading.Lock() # In the property: @property def context_window_size(self) -> int: if self._max_context_size is not None: return self._max_context_size with self._cache_lock: # Double-check after acquiring lock if self._max_context_size is not None: return self._max_context_size # ... rest of lookup logic ...
🧹 Nitpick comments (2)
holmes/core/llm.py (2)
290-295: Refactor: Consider moving return to else block.The static analysis tool suggests restructuring this try-except to improve clarity by moving the successful return to an else block.
🔎 Proposed refactor
for name in self._get_model_name_variants_for_lookup(): try: self._max_context_size = litellm.model_cost[name]["max_input_tokens"] - return self._max_context_size except Exception: continue + else: + return self._max_context_sizeBased on static analysis hints.
457-468: Refactor: Consider moving return to else block.Similar to the
context_window_sizeproperty, the static analysis tool suggests restructuring this try-except for clarity.🔎 Proposed refactor
for name in self._get_model_name_variants_for_lookup(): try: litellm_max_output_tokens = litellm.model_cost[name][ "max_output_tokens" ] if litellm_max_output_tokens < max_output_tokens: max_output_tokens = litellm_max_output_tokens self._max_output_tokens = max_output_tokens - return self._max_output_tokens except Exception: continue + else: + return self._max_output_tokensBased on static analysis hints.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
examples/custom_llm.pyholmes/config.pyholmes/core/conversations.pyholmes/core/feedback.pyholmes/core/llm.pyholmes/core/tools_utils/tool_context_window_limiter.pyholmes/core/truncation/input_context_window_limiter.pyholmes/interactive.pyholmes/utils/console/logging.pytests/config_class/test_config_api_base_version.pytests/config_class/test_config_load_model_list.pytests/config_class/test_config_load_robusta_ai.pytests/conftest.pytests/core/test_feedback.pytests/core/tools_utils/test_tool_context_window_limiter.pytests/test_approval_workflow.pytests/test_interactive.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/config.pytests/test_approval_workflow.pytests/conftest.pytests/config_class/test_config_load_model_list.pytests/core/tools_utils/test_tool_context_window_limiter.pyholmes/core/conversations.pyholmes/utils/console/logging.pyexamples/custom_llm.pytests/config_class/test_config_load_robusta_ai.pyholmes/core/truncation/input_context_window_limiter.pytests/test_interactive.pyholmes/core/feedback.pytests/config_class/test_config_api_base_version.pytests/core/test_feedback.pyholmes/interactive.pyholmes/core/llm.pyholmes/core/tools_utils/tool_context_window_limiter.py
holmes/config.py
📄 CodeRabbit inference engine (CLAUDE.md)
Load configuration from ~/.holmes/config.yaml or via CLI options in config.py
Files:
holmes/config.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Tests: Match source structure under tests/
Files:
tests/test_approval_workflow.pytests/conftest.pytests/config_class/test_config_load_model_list.pytests/core/tools_utils/test_tool_context_window_limiter.pytests/config_class/test_config_load_robusta_ai.pytests/test_interactive.pytests/config_class/test_config_api_base_version.pytests/core/test_feedback.py
🧠 Learnings (3)
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration
Applied to files:
holmes/config.pytests/test_approval_workflow.pytests/core/tools_utils/test_tool_context_window_limiter.pyholmes/core/conversations.pyexamples/custom_llm.pyholmes/core/truncation/input_context_window_limiter.pytests/test_interactive.pyholmes/core/feedback.pytests/config_class/test_config_api_base_version.pyholmes/core/llm.pyholmes/core/tools_utils/tool_context_window_limiter.py
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Implement structured tool calling with automatic retry and error handling
Applied to files:
tests/core/tools_utils/test_tool_context_window_limiter.pyexamples/custom_llm.pytests/test_interactive.pyholmes/core/tools_utils/tool_context_window_limiter.py
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/plugins/prompts/**/*.jinja2 : Prompts: holmes/plugins/prompts/{name}.jinja2 file structure
Applied to files:
holmes/core/conversations.pyexamples/custom_llm.py
🧬 Code graph analysis (11)
holmes/config.py (1)
holmes/core/llm.py (4)
context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)
tests/conftest.py (3)
examples/custom_llm.py (2)
context_window_size(17-18)maximum_output_token(21-22)holmes/core/llm.py (4)
context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)tests/core/test_feedback.py (2)
context_window_size(23-24)maximum_output_token(27-28)
tests/config_class/test_config_load_model_list.py (1)
holmes/core/llm.py (4)
context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)
tests/core/tools_utils/test_tool_context_window_limiter.py (4)
examples/custom_llm.py (1)
context_window_size(17-18)holmes/core/llm.py (2)
context_window_size(97-98)context_window_size(278-304)tests/conftest.py (1)
context_window_size(98-99)tests/core/test_feedback.py (1)
context_window_size(23-24)
examples/custom_llm.py (1)
holmes/core/llm.py (6)
LLM(90-135)TokenCountMetadata(56-63)context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)
tests/config_class/test_config_load_robusta_ai.py (3)
examples/custom_llm.py (1)
context_window_size(17-18)holmes/core/llm.py (2)
context_window_size(97-98)context_window_size(278-304)tests/conftest.py (1)
context_window_size(98-99)
holmes/core/feedback.py (4)
examples/custom_llm.py (1)
context_window_size(17-18)holmes/core/llm.py (2)
context_window_size(97-98)context_window_size(278-304)tests/conftest.py (1)
context_window_size(98-99)tests/core/test_feedback.py (1)
context_window_size(23-24)
tests/config_class/test_config_api_base_version.py (4)
examples/custom_llm.py (2)
context_window_size(17-18)maximum_output_token(21-22)holmes/core/llm.py (4)
context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)tests/conftest.py (2)
context_window_size(98-99)maximum_output_token(102-103)tests/core/test_feedback.py (2)
context_window_size(23-24)maximum_output_token(27-28)
holmes/interactive.py (4)
examples/custom_llm.py (2)
context_window_size(17-18)maximum_output_token(21-22)holmes/core/llm.py (4)
context_window_size(97-98)context_window_size(278-304)maximum_output_token(102-103)maximum_output_token(446-477)tests/conftest.py (2)
context_window_size(98-99)maximum_output_token(102-103)tests/core/test_feedback.py (2)
context_window_size(23-24)maximum_output_token(27-28)
holmes/core/llm.py (3)
examples/custom_llm.py (2)
context_window_size(17-18)maximum_output_token(21-22)tests/conftest.py (2)
context_window_size(98-99)maximum_output_token(102-103)tests/core/test_feedback.py (2)
context_window_size(23-24)maximum_output_token(27-28)
holmes/core/tools_utils/tool_context_window_limiter.py (5)
holmes/core/llm.py (2)
context_window_size(97-98)context_window_size(278-304)holmes/core/tools.py (1)
StructuredToolResultStatus(52-76)examples/custom_llm.py (1)
context_window_size(17-18)tests/conftest.py (1)
context_window_size(98-99)tests/core/test_feedback.py (1)
context_window_size(23-24)
🪛 Ruff (0.14.10)
holmes/core/llm.py
293-293: Consider moving this statement to an else block
(TRY300)
466-466: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: llm_evals
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build
🔇 Additional comments (20)
holmes/core/feedback.py (1)
29-31: LGTM!The migration from method call to property access is correct and aligns with the broader refactoring to cache LLM configuration values.
holmes/config.py (1)
517-519: LGTM!The log message correctly uses the property-based access pattern for LLM configuration values, consistent with the broader refactoring to cache these values and reduce repeated lookups.
tests/test_interactive.py (1)
296-301: LGTM!The test setup correctly updates the mock to use direct attribute assignment for
context_window_size, consistent with the migration to property-based access across the LLM interface.holmes/core/tools_utils/tool_context_window_limiter.py (1)
14-15: LGTM!The change from method call to property access is correct and aligns with the project-wide refactoring to use cached LLM configuration values.
tests/config_class/test_config_load_robusta_ai.py (1)
143-150: LGTM!The test correctly verifies the context window size using property access, consistent with the broader migration to property-based LLM configuration.
tests/config_class/test_config_load_model_list.py (1)
57-62: LGTM!The mock setup correctly uses direct attribute assignment for the property-based LLM interface, ensuring tests align with the refactored API.
tests/test_approval_workflow.py (3)
84-102: LGTM!The mock setup correctly uses direct attribute assignment for
context_window_sizeandmaximum_output_token, consistent with the property-based LLM interface.
218-236: LGTM!The mock setup correctly uses direct attribute assignment for the property-based LLM configuration values.
365-383: LGTM!The mock setup correctly uses direct attribute assignment, consistent with the refactored property-based LLM interface.
holmes/core/truncation/input_context_window_limiter.py (1)
153-154: LGTM! Property access correctly replaces method calls.The change from
llm.get_context_window_size()andllm.get_maximum_output_token()to property access (llm.context_window_sizeandllm.maximum_output_token) is correct and aligns with the broader refactor. This enables the caching behavior described in the PR objectives, where the underlying properties inholmes/core/llm.pycache the resolved values to avoid repeated litellm lookups and warnings.tests/config_class/test_config_api_base_version.py (1)
48-49: LGTM! Test mocks correctly updated for property-based interface.The test updates consistently replace
mock_llm_instance.get_context_window_size.return_valuewith direct attribute assignmentmock_llm_instance.context_window_size = 128000, which is the correct approach for mocking properties. All test cases maintain their original assertions and logic while adapting to the new interface.Also applies to: 77-78, 114-115, 152-153, 190-191, 241-242, 300-301, 340-341, 380-381, 421-422, 462-463
holmes/interactive.py (1)
496-497: LGTM! Property access correctly replaces method calls in interactive context display.The changes from
ai.llm.get_context_window_size()andai.llm.get_maximum_output_token()to property access are correct. These values are used to calculate and display conversation context statistics in the/contextcommand, and the property-based access will benefit from the caching implemented in the underlying LLM class.holmes/core/conversations.py (1)
27-30: LGTM! Property access correctly replaces method calls in tool size calculation.The conversion to property access (
ai.llm.context_window_sizeandai.llm.maximum_output_token) in thecalculate_tool_sizefunction is correct. Since this function is called multiple times during conversation building, the caching behavior of these properties will improve performance by avoiding repeated litellm lookups.examples/custom_llm.py (1)
16-21: LGTM! Example correctly demonstrates property-based LLM interface.The conversion from methods to properties using
@propertydecorators is correct and provides a clear example for users implementing custom LLM classes. The return values remain unchanged (128000 for context window, 4096 for max output tokens), ensuring the example continues to work as expected.tests/conftest.py (1)
97-102: LGTM! MockLLM correctly updated to property-based interface.The conversion of
get_context_window_sizeandget_maximum_output_tokenfrom methods to properties is correct and consistent with the LLM interface changes. This ensures that all tests usingMockLLMwill work with the new property-based access pattern.tests/core/tools_utils/test_tool_context_window_limiter.py (1)
18-18: LGTM! Test mocks correctly updated for property-based LLM interface.The changes from setting
mock_llm.get_context_window_size.return_valueto directly assigningmock_llm.context_window_sizeare correct for testing property access. The test logic and assertions remain unchanged, ensuring continued validation of the tool context window limiting functionality.Also applies to: 119-119, 172-172, 201-201, 228-228
tests/core/test_feedback.py (1)
22-27: LGTM! MockLLM properties correctly implemented for feedback tests.The conversion from methods to properties is correct. The
context_window_sizeproperty returnsself._context_size(allowing tests to configure different sizes), andmaximum_output_tokenreturns a fixed value of 1024. This implementation is appropriate for testing feedback functionality.holmes/core/llm.py (3)
95-103: LGTM: Conversion to properties is appropriate.Converting these from methods to properties is a good design choice for read-only configuration values.
10-10: LGTM: Imports properly placed.The new imports are correctly placed at the top of the file, consistent with coding guidelines.
Also applies to: 22-22
110-110: LGTM: Property access syntax correctly applied.The call site has been properly updated to use property access instead of method invocation.
|
Dev Docker images are ready for this commit:
Use either tag to pull the image for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
holmes/core/llm.py (2)
171-176: Critical: Initialize instance variables in__init__beforeupdate_custom_args()is called.The instance variables
_max_context_sizeand_max_output_tokensare conditionally assigned here fromcustom_args. If these keys don't exist incustom_args, the attributes will never be created, causing anAttributeErrorwhen the properties at lines 279 and 447 attempt to access them.🔎 Proposed fix
Initialize both attributes to
Nonein__init__after line 164:self.name = name self.is_robusta_model = is_robusta_model + self._max_context_size = None + self._max_output_tokens = None self.update_custom_args()
447-448: Critical: Fix truthiness check for cached value.The check
if self._max_output_tokens:will incorrectly treat0asFalseand skip returning a cached value of0. This should explicitly check forNoneto match the pattern used incontext_window_sizeat line 279.🔎 Proposed fix
- if self._max_output_tokens: + if self._max_output_tokens is not None: return self._max_output_tokens
🧹 Nitpick comments (1)
holmes/core/llm.py (1)
290-294: Optional: Consider moving return statements to else blocks.Ruff suggests restructuring the exception handling at lines 293 and 466 to make the control flow more explicit by moving the cache-and-return logic into
elseblocks.🔎 Example refactor for lines 290-294
for name in self._get_model_name_variants_for_lookup(): try: self._max_context_size = litellm.model_cost[name]["max_input_tokens"] - return self._max_context_size except Exception: continue + else: + return self._max_context_sizeApply similar pattern at lines 456-467 for
maximum_output_token.Also applies to: 456-467
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/llm.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/llm.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration
📚 Learning: 2025-12-21T13:17:48.366Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-21T13:17:48.366Z
Learning: Applies to holmes/core/tool_calling_llm.py : Use LiteLLM for multi-provider support (OpenAI, Anthropic, Azure, etc.) in LLM integration
Applied to files:
holmes/core/llm.py
📚 Learning: 2025-12-23T13:33:24.074Z
Learnt from: mainred
Repo: HolmesGPT/holmesgpt PR: 1231
File: holmes/utils/console/logging.py:42-44
Timestamp: 2025-12-23T13:33:24.074Z
Learning: In holmes/utils/console/logging.py, the `import litellm` statement is intentionally placed inline within the `suppress_noisy_logs()` function (not at the module top) to prevent unexpected log output during module initialization. The litellm library can produce debug logs at import time, so importing it inline allows `suppress_debug_info = True` to be set immediately after the import, before any logs are emitted.
Applied to files:
holmes/core/llm.py
🧬 Code graph analysis (1)
holmes/core/llm.py (3)
examples/custom_llm.py (2)
context_window_size(17-18)maximum_output_token(21-22)tests/conftest.py (2)
context_window_size(98-99)maximum_output_token(102-103)tests/core/test_feedback.py (2)
context_window_size(23-24)maximum_output_token(27-28)
🪛 Ruff (0.14.10)
holmes/core/llm.py
293-293: Consider moving this statement to an else block
(TRY300)
466-466: Consider moving this statement to an else block
(TRY300)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: llm_evals
- GitHub Check: build (3.10)
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build
🔇 Additional comments (2)
holmes/core/llm.py (2)
277-304: Good: Correct None check and caching implementation.The property correctly uses
if self._max_context_size is not None:to check for cached values, which properly handles0as a valid cached value. The caching logic ensures the warning at line 298 will only appear once per instance, addressing the PR objective of avoiding repeated warnings.
95-103: Approve the caching approach for context window and output token limits.The implementation successfully addresses the PR objectives by:
- Converting methods to properties for cleaner access patterns
- Caching lookup results to avoid repeated litellm model_cost lookups
- Ensuring warning messages (lines 298, 471) only appear once per instance
Once the critical initialization and truthiness check issues are addressed, this will effectively eliminate the repeated warnings described in the PR objectives.
Also applies to: 277-304, 445-477
|
Dev Docker images are ready for this commit:
Use either tag to pull the image for testing. |
|
@moshemorad @arikalon1 May I have a review of this PR, thanks. |
Holmesgpt tries to calculate the max context and output size from the https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json per the model name, but for azure openai service, the deployment not the model name is used for locating a azure openai service, and the deployment by default has the same with the model name, but it cannot be promised. In case the deployment name does not match model name, homlesgpt will print annoying warning log like below every time there's any llm call. This is not necessary considering we don't change the windows size on time.
Also, we need to suppress the litellm debug info when verbose logging is not enabled.
Tests
Before the change, running holmes prints annoying warning log
After the change, it prints the warning log that the model name is found once.
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.