-
Notifications
You must be signed in to change notification settings - Fork 656
feat: Input/output Dump + Replay Mode for API Logging Level 10 #2206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a Level 10 "Flight Recorder": crash-safe input/output tensor dumps (torch/safetensors), include/exclude filtering, session JSONL metadata, replay_from_dump/replay_sequence, a CLI Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Decorator as flashinfer_api
participant DumpSystem as Dump
participant Storage as JSONL/SafeTensors
participant ReplayEngine as Replay
Client->>Decorator: call API (Level 10)
Decorator->>DumpSystem: _dump_function_inputs()
DumpSystem->>Storage: write session.jsonl + inputs.(pt|safetensors)
Decorator->>Decorator: execute wrapped function
Decorator->>DumpSystem: _dump_function_outputs()
DumpSystem->>Storage: append outputs.(pt|safetensors) + metadata
Client->>ReplayEngine: replay_from_dump(dump_dir) / replay_sequence
ReplayEngine->>Storage: read session.jsonl and tensors
ReplayEngine->>ReplayEngine: reconstruct inputs, execute, compare outputs
ReplayEngine->>Client: report pass / fail / mismatch
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a powerful new "Flight Recorder" mode (logging level 10) to the FlashInfer library, designed to enhance debugging and reproducibility. By enabling this mode, users can automatically dump all input and output tensors, along with call metadata, for any FlashInfer API call to disk. This allows for precise recreation of API execution environments. Complementing this, a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a powerful 'Flight Recorder' mode (logging level 10) for dumping and replaying API calls, which is a fantastic feature for debugging and reproducibility. The implementation is very thorough, covering stateful class replays, CLI integration, and comprehensive testing. The documentation is also excellent and provides clear examples. I have one suggestion regarding Python version compatibility.
| return False | ||
| return all( | ||
| _compare_results(a, e, rtol, atol) | ||
| for a, e in zip(actual, expected, strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strict=True argument for zip was introduced in Python 3.10. If this project supports older Python versions, this will raise a TypeError. Since you already perform a length check on lines 520-524, the strict=True is redundant for correctness and can be safely removed to ensure broader compatibility.
| for a, e in zip(actual, expected, strict=True) | |
| for a, e in zip(actual, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
docs/logging.rst (1)
450-450: Consider using a more realistic timestamp in the example path.The timestamp
20251234_456789_012contains invalid values (month 12, day 34). While this is just an example, using a valid-looking timestamp like20251204_143217_012would be more consistent with the other examples in the documentation.- dump_dir = Path("./bmm_fp8_dumps/20251234_456789_012_bmm_fp8_call0001") + dump_dir = Path("./bmm_fp8_dumps/20251204_143217_012_bmm_fp8_call0001")flashinfer/api_logging.py (5)
270-272: Potential race condition when cleaning up empty directory.If another thread/process writes to
dump_dirbetween the size check failure andrmdir(), the operation will fail withOSError. Consider usingshutil.rmtreewithignore_errors=Trueor wrapping in a try-except.# Clean up empty directory - dump_dir.rmdir() + try: + dump_dir.rmdir() + except OSError: + pass # Directory may have been modified by another process return None
341-346: Consider usinglogging.exceptionfor automatic traceback inclusion.Per static analysis hints,
logging.exceptionautomatically includes the traceback, making the code cleaner.except Exception as e: - _logger.error(f"Failed to dump function call {func_name}: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Failed to dump function call {func_name}: {e}") return None
437-441: Same suggestion: uselogging.exceptionfor cleaner traceback logging.except Exception as e: - _logger.error(f"Failed to dump outputs to {dump_dir}: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Failed to dump outputs to {dump_dir}: {e}")
778-783: Consider usinglogging.exceptionfor execution failures.This would automatically include the traceback.
except Exception as e: - _logger.error(f"Execution failed: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Execution failed: {e}") result_dict["execution_error"] = str(e)
1356-1363: Consider using_logger.debuginstead of_logger.infofor output dump success.Line 1360 uses
_logger.infowhile the corresponding input dump success on line 1325 uses_logger.debug. For consistency, consider using the same log level.if _API_LOG_LEVEL >= 10 and dump_dir: try: _dump_function_outputs(dump_dir, result) - _logger.info(f"Outputs dumped to: {dump_dir}") + _logger.debug(f"Outputs dumped to: {dump_dir}") except Exception as e: _logger.error(f"[DUMP ERROR (outputs) in {func_name}]: {e}")flashinfer/__main__.py (1)
376-397: Consider adding a count for mismatches in the summary.Currently, mismatches and errors are both counted as "failed". For better debugging, consider tracking them separately.
passed = 0 failed = 0 + mismatch = 0 for i, res in enumerate(results): # ... existing code ... if "error" in res: # ... existing code ... failed += 1 elif res.get("comparison_match"): # ... existing code ... passed += 1 else: click.secho(f"[{i + 1}] {dump_name}: ⚠️ Mismatch", fg="yellow") - failed += 1 + mismatch += 1 click.secho( - f"\nSummary: {passed} passed, {failed} failed/mismatch", fg="white" + f"\nSummary: {passed} passed, {failed} failed, {mismatch} mismatch", fg="white" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/logging.rst(3 hunks)flashinfer/__main__.py(1 hunks)flashinfer/api_logging.py(9 hunks)tests/utils/test_logging_replay.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
flashinfer/__main__.py (1)
flashinfer/api_logging.py (2)
replay_sequence(797-858)replay_from_dump(547-794)
tests/utils/test_logging_replay.py (4)
flashinfer/utils.py (1)
get_compute_capability(253-256)flashinfer/api_logging.py (1)
replay_sequence(797-858)flashinfer/autotuner.py (1)
autotune(251-262)flashinfer/__main__.py (1)
cli(68-73)
flashinfer/api_logging.py (1)
include/flashinfer/logging.h (1)
logging(31-41)
🪛 Ruff (0.14.8)
flashinfer/__main__.py
403-403: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
154-154: Do not catch blind exception: Exception
(BLE001)
322-322: Do not catch blind exception: Exception
(BLE001)
341-341: Do not catch blind exception: Exception
(BLE001)
342-342: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
345-345: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
437-437: Do not catch blind exception: Exception
(BLE001)
438-438: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
441-441: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
484-484: Consider moving this statement to an else block
(TRY300)
485-485: Do not catch blind exception: Exception
(BLE001)
597-597: Avoid specifying long messages outside the exception class
(TRY003)
602-602: Avoid specifying long messages outside the exception class
(TRY003)
614-614: Avoid specifying long messages outside the exception class
(TRY003)
724-724: Consider moving this statement to an else block
(TRY300)
725-725: Do not catch blind exception: Exception
(BLE001)
726-726: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
778-778: Do not catch blind exception: Exception
(BLE001)
779-779: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
782-782: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
818-818: Avoid specifying long messages outside the exception class
(TRY003)
853-853: Do not catch blind exception: Exception
(BLE001)
855-855: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1314-1315: try-except-pass detected, consider logging the exception
(S110)
1314-1314: Do not catch blind exception: Exception
(BLE001)
1326-1326: Do not catch blind exception: Exception
(BLE001)
1327-1327: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1353-1353: Do not catch blind exception: Exception
(BLE001)
1354-1354: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1361-1361: Do not catch blind exception: Exception
(BLE001)
1362-1362: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (16)
docs/logging.rst (2)
48-51: LGTM - Clear documentation for Level 10 Flight Recorder.The logging levels table is well-structured and clearly explains the new Level 10 capabilities.
76-100: LGTM - Dump configuration is well-documented.The environment variables table provides clear defaults and descriptions for the Level 10 dump behavior.
flashinfer/api_logging.py (7)
54-57: Thread-safety consideration for global dump state.The global counters
_dump_count,_dump_total_size_bytes, and_dump_call_counterare accessed and modified without synchronization. In multi-threaded environments (e.g., concurrent API calls), this could lead to race conditions causing inaccurate counts or size tracking. Consider usingthreading.Lockif thread-safety is required, or document that Level 10 logging is not thread-safe.
102-115: Good security warning for Level 10 dumping.The explicit warning about sensitive data and the direct use of
print()ensures visibility. This is appropriate for a security-related warning.
183-184: Tensor CPU transfer may cause GPU synchronization.The
arg.cpu()call triggers a CUDA synchronization. This is expected behavior for Level 10 logging but worth noting that it will impact performance. The documentation already mentions Level 10 is for "Full Reproducibility / Debugging" which implies performance is not a concern.
473-490: LGTM - Dynamic function resolution with appropriate error handling.The function properly handles nested class methods and logs warnings on failure without crashing.
547-594: Well-documented replay function with comprehensive parameter descriptions.The docstring clearly explains all parameters and return values, including the stateful object replay mechanism via
object_registry.
797-858: LGTM - Sequence replay with proper state management.The function correctly maintains an
object_registryacross replay calls to handle stateful objects, and gracefully continues on individual failures.
1317-1328: Good crash-safe design: inputs dumped before execution.Dumping inputs before calling the function ensures that debugging information is available even if the function crashes with a CUDA error.
flashinfer/__main__.py (2)
334-349: LGTM - Clean CLI command structure with proper validation.The command validates the dump directory existence before proceeding and correctly distinguishes between single dumps and session directories.
403-404: Broad exception catch is appropriate here for CLI robustness.For a CLI tool, catching all exceptions and displaying a user-friendly error message is the right approach to avoid exposing stack traces to end users.
tests/utils/test_logging_replay.py (5)
39-43: Module cleanup for reimport is necessary but has side effects.Deleting modules from
sys.modulesforces reimport but may cause issues if other tests depend on cached module state. This is acceptable for isolated test fixtures.
46-77: Well-structured fixture with proper cleanup.The fixture correctly saves and restores the original environment, and forces module reimport to pick up the new configuration.
80-114: Helper function is clear and reusable.The
verify_and_replay_dumphelper encapsulates the common verification logic, making tests more readable. The filtering by function name is a useful feature for multi-call dumps.
327-348: Good integration test for CLI replay.Testing the CLI command with
CliRunnerensures the full integration path works correctly.
147-151: The review comment is incorrect. Themm_fp8function is explicitly documented as "Only supported on Blackwell GPUs" inflashinfer/trtllm_low_latency_gemm.py(line 127), which uses thetrtllm_low_latency_gemmbackend that implements the actual computation. The compute capability checkcompute_capability[0] not in [10]correctly restricts execution to Blackwell (SM100/SM103, compute capability 10.x) and does not need adjustment. There is no Hopper (9.x) support, and the test skip message accurately reflects this hardware limitation.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/logging.rst(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (7)
docs/logging.rst (7)
48-51: LGTM!The table entry clearly presents the new Level 10 feature and its position in the logging hierarchy. The description effectively communicates the key benefit (full input/output dumps for reproducibility).
70-70: LGTM!The environment variable documentation is updated consistently with the new logging level table.
150-194: Verify CLI replay output format and clarify loglevel behavior.The example shows a comprehensive workflow, but a few details need verification:
- The example output format (lines 171–194) should be verified against actual
flashinfer replayCLI implementation.- Line 166 mentions setting
FLASHINFER_LOGLEVEL=0 # 1 for more detailed replay resultsbut doesn't explain the difference. Consider clarifying what level 0 vs. 1 produces during replay.- The example command (line 162) is extensive and may be difficult for users to reproduce without the exact environment. Consider adding a note about prerequisites or simplifying to a minimal reproducible example if possible.
76-101: Documentation is accurate—all dump configuration variables match implementation.The environment variables in the documentation are correctly specified:
FLASHINFER_DUMP_DIR(str, default "flashinfer_dumps") ✓FLASHINFER_DUMP_MAX_SIZE_GB(float, default 20) ✓FLASHINFER_DUMP_MAX_COUNT(int, default 1000) ✓All variable names, default values, and type specifications match the implementation in
flashinfer/api_logging.pyand are validated by test code.
206-251: Thebmm_fp8()call at line 246 is correct. The function signature inflashinfer/gemm/gemm_base.pyaccepts 7 parameters (5 required positional + 2 optional), and the example provides all required arguments in the correct order:bmm_fp8(input_fp8, mat2_fp8, input_inv_s, mat2_inv_s, res_dtype, res, backend="cublas"). No changes needed.
268-306: Verify replay_from_dump API and return structure.The reproducer script relies on the
replay_from_dump()function returning a dictionary with keys"args","kwargs", and"expected_tensors". Please verify:
- The function signature matches:
replay_from_dump(dump_dir, compare_outputs=True, device="cuda", run=False)(note: parameter isdump_dir, notdir_path)- The return value structure includes the expected keys ✓
- The return value from
reson line 292 is compatible with thebmm_fp8()call signature ✓
378-433: The reproducer script correctly uses thereplay_sequenceAPI. The function signature matches the expected parameters (root_diranddevice="cuda"), and the result dictionaries contain all expected keys:"metadata"(with"function_name"),"dump_dir","error"(optional),"comparison_match", and"execution_result". The data access patterns in the script are correct and align with the actual implementation.
| # Create workspace and wrapper - __init__ will be logged | ||
| workspace_buffer = torch.empty(32 * 1024 * 1024, dtype=torch.int8, device="cuda") | ||
| wrapper = flashinfer.decode.BatchDecodeWithPagedKVCacheWrapper(workspace_buffer, kv_layout) | ||
| # Plan - will be logged | ||
| wrapper.plan( | ||
| kv_indptr, | ||
| kv_indices, | ||
| kv_last_page_len, | ||
| num_qo_heads, | ||
| num_kv_heads, | ||
| head_dim, | ||
| page_size, | ||
| data_type=torch.float16, | ||
| q_data_type=torch.float16, | ||
| ) | ||
| # Run - will be logged | ||
| output, lse = wrapper.run(q, kv_data, return_lse=True) | ||
| # Print a small portion of the output | ||
| print("Output shape:", output.shape) | ||
| print("Output[0, :3, :3]:") | ||
| print(output[0, :3, :3]) | ||
| print("\nLSE shape:", lse.shape) | ||
| print("LSE[0, :5]:", lse[0, :5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f \( -name "*.py" -o -name "*.pyx" -o -name "*.pxd" \) | head -20Repository: flashinfer-ai/flashinfer
Length of output: 968
🏁 Script executed:
rg "class BatchDecodeWithPagedKVCacheWrapper" -A 30 --type=pyRepository: flashinfer-ai/flashinfer
Length of output: 1880
🏁 Script executed:
rg "def plan\(" -B 5 -A 20 --type=py | grep -A 20 "BatchDecodeWithPagedKVCache"Repository: flashinfer-ai/flashinfer
Length of output: 50
🏁 Script executed:
rg "def __init__" -A 15 flashinfer/decode.py | head -50Repository: flashinfer-ai/flashinfer
Length of output: 1993
🏁 Script executed:
rg "class BatchDecodeWithPagedKVCacheWrapper" -A 100 flashinfer/decode.py | grep -E "def (plan|run|__init__)" -A 10Repository: flashinfer-ai/flashinfer
Length of output: 529
🏁 Script executed:
sed -n '/class BatchDecodeWithPagedKVCacheWrapper/,/^class /p' flashinfer/decode.py | head -300Repository: flashinfer-ai/flashinfer
Length of output: 13143
🏁 Script executed:
rg "def run\(" -A 30 flashinfer/decode.py | head -80Repository: flashinfer-ai/flashinfer
Length of output: 3186
Fix BatchDecodeWithPagedKVCacheWrapper API usage in the producer script.
The example has several API inconsistencies with the actual implementation:
-
Constructor parameter name (line 354): The first parameter is
float_workspace_buffer, notworkspace_buffer. Additionally, the tensor should be a float workspace buffer, but line 353 creates it withdtype=torch.int8— it should be float or uint8 for a general workspace buffer. -
Deprecated parameter in plan() (line 361): The
data_typeparameter is deprecated. Useq_data_typeandkv_data_typeseparately instead, or rely onq_data_typealone (which defaults totorch.float16). -
run() method parameter name (line 369): The second parameter is
paged_kv_cache, notkv_data. The signature supportsreturn_lse=Truecorrectly.
🤖 Prompt for AI Agents
In docs/logging.rst around lines 351-376, update the
BatchDecodeWithPagedKVCacheWrapper usage: rename the workspace tensor variable
and constructor argument to float_workspace_buffer and create the tensor with an
appropriate float/uint8 dtype (not torch.int8); in the wrapper.plan(...) call
remove the deprecated data_type argument and supply q_data_type (and
kv_data_type if needed) instead; and in wrapper.run(...) pass the paged_kv_cache
as the second argument (rename kv_data to paged_kv_cache) while keeping
return_lse=True. Ensure names and types match the current API.
8519e13 to
2aa395d
Compare
2aa395d to
3edb449
Compare
3edb449 to
f6e1922
Compare
|
/bot run |
| This function: | ||
| 1. Extracts tensors and metadata from inputs | ||
| 2. Creates a timestamped directory | ||
| 3. Saves inputs.pt and partial metadata.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great feedback. In the latest commit, I switched:
- from
metadata.jsontometadata.jsonl. - from only
[intput.output.pt]to giving an option to dump as safetensors viaFLASHINFER_DUMP_SAFETENSORS. As we discussed, dumping via torch.pt retains all non-contiguity and custom strides while safetensors loses them, so it's likely best to keep both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can generate a separate jsonl file with another workflow in flashinfer-bench.
|
[FAILED] Pipeline #41318472: 7/20 passed |
|
|
||
| **Producer Script** (``batch_decode_producer.py``): | ||
|
|
||
| This script demonstrates logging with a stateful wrapper class that requires ``__init__``, ``plan``, and ``run`` calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if in the dump folder there are multiple init, plan, and runs. Is there a way to pick any combination of them in the replay mode? e.g. --init init_0 --plan plan_3 --run run_2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When multiple wrapper instances are created with interleaved plan()/run() calls, the system correctly tracks them:
- Each dump's
metadata.jsonlcontains self_id (Python's id(self)) andprocess_id - During
replay_sequence(), theobject_registryuses composite keys(process_id, self_id)to maintain separate instances- This composite keys
(process_id, self_id)was been added in the past few days, after we tried out this PR last month to debug the MLA IMA issue and struggled with confusion 😓
- This composite keys
- Chronological replay automatically routes each plan()/run() to the correct instanceFor example, with two interleaved instances:
__init__ (self_id=12345) → registry[(pid, 12345)] = new instance A
__init__ (self_id=67890) → registry[(pid, 67890)] = new instance B
plan() (pid, self_id=12345) → calls A.plan()
plan() (pid, self_id=67890) → calls B.plan()
run() (pid, self_id=67890) → calls B.run()
run() (pid, self_id=12345) → calls A.run()
Regarding mixing the function calls from different wrapper instances (--init init_0 --plan plan_3 --run run_2), the current implementation does not support it because it would not be a valid replay. One could potentially change the (pid, self_id) in the metadata files and force a replay, but combining calls from different instances would likely fail or produce incorrect results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The composite key is good enough! LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @flashinfer/api_logging.py:
- Around line 262-266: The broad except block that returns a fallback for
un-serializable values swallows errors and offers no diagnostics; change the
`except Exception:` to `except Exception as e:` and log the exception
(preferably via the module logger or `logging.exception`) including the value
and traceback before returning the fallback dict (which can keep `"type":
type(value).__name__` and `"repr": "<not serializable>"`). Ensure you reference
the same local symbols (`value`) and preserve the existing return shape while
adding the logging call so serialization failures are recorded.
- Around line 403-407: Replace calls to _logger.error(...) inside exception
handlers with _logger.exception(...) so the traceback is captured when
re-raising; for example change the block that logs "safetensors package not
installed. Install with: pip install safetensors" to use _logger.exception with
the same message before the raise, and apply the same change for the other
exception handlers referenced (the _logger.error occurrences around the
safetensors message and the ones at the other listed places). Keep the existing
messages and re-raise behavior unchanged—only swap _logger.error to
_logger.exception so traceback is included.
🧹 Nitpick comments (2)
benchmarks/routines/gemm.py (1)
1041-1046: Consider making trtllm quantization conditional to avoid wasted computation.The
mat2_fp4_trtllmquantization is now computed unconditionally, but it's only used whenbackend == "trtllm"(lines 1072, 1110). If trtllm is not in the backends list (e.g., filtered out by compute capability or user choice), this quantization step is wasted.⚡ Suggested optimization
Move the quantization inside a conditional block or compute it lazily:
- mat2_fp4_trtllm, mat2_inv_s_trtllm = flashinfer.nvfp4_quantize( - mat2, - global_sf_mat2, - sfLayout=flashinfer.SfLayout.layout_128x4, - do_shuffle=True, - ) + # Compute trtllm-specific quantization only if needed + mat2_fp4_trtllm, mat2_inv_s_trtllm = None, None + if "trtllm" in backends: + mat2_fp4_trtllm, mat2_inv_s_trtllm = flashinfer.nvfp4_quantize( + mat2, + global_sf_mat2, + sfLayout=flashinfer.SfLayout.layout_128x4, + do_shuffle=True, + )Then update line 1072 and 1110 to handle None gracefully, or ensure trtllm is in backends before those lines execute.
flashinfer/__main__.py (1)
455-457: Improve error reporting for replay failures.The broad
except Exceptioncatch at line 455 hides the traceback, making it difficult to diagnose replay failures. Consider logging the full exception details.🔧 Suggested improvement
except Exception as e: - click.secho(f"❌ Replay failed: {e}", fg="red") + import traceback + click.secho(f"❌ Replay failed: {e}", fg="red") + if os.environ.get("FLASHINFER_LOGLEVEL", "0") != "0": + click.secho(traceback.format_exc(), fg="red")This preserves the simple error message for users but provides detailed diagnostics when logging is enabled.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
benchmarks/routines/gemm.pydocs/logging.rstflashinfer/__main__.pyflashinfer/api_logging.pytests/utils/test_logging_replay.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test implementations should useflashinfer.utilsfunctions (get_compute_capability,is_sm90a_supported,is_sm100a_supported, etc.) to skip tests on unsupported GPU architectures
For testing withmpirunon multi-GPU systems, use the pattern:mpirun -np <num_gpus> pytest tests/path/to/test.py::test_function
Avoid OOM (out-of-memory) errors in tests by using appropriate problem sizes -tests/conftest.pyprovides auto-skipping for OOM tests as a safety net but should not be relied upon
Files:
tests/utils/test_logging_replay.py
flashinfer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
flashinfer/**/*.py: Use@functools.cachedecorator on Python API functions to implement module-level caching and avoid recompilation
Use@flashinfer_apidecorator for debugging API calls, enable viaFLASHINFER_LOGLEVELenvironment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Files:
flashinfer/api_logging.pyflashinfer/__main__.py
🧬 Code graph analysis (3)
tests/utils/test_logging_replay.py (2)
flashinfer/utils.py (1)
get_compute_capability(258-261)flashinfer/api_logging.py (2)
replay_sequence(987-1049)replay_from_dump(700-984)
flashinfer/api_logging.py (1)
tests/utils/test_logging.py (1)
run(297-298)
benchmarks/routines/gemm.py (1)
flashinfer/fp4_quantization.py (2)
nvfp4_quantize(816-873)SfLayout(805-812)
🪛 Ruff (0.14.10)
flashinfer/api_logging.py
262-262: Do not catch blind exception: Exception
(BLE001)
403-406: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
457-457: Do not catch blind exception: Exception
(BLE001)
482-482: Do not catch blind exception: Exception
(BLE001)
483-483: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
486-486: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
590-590: Do not catch blind exception: Exception
(BLE001)
591-591: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
594-594: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
637-637: Consider moving this statement to an else block
(TRY300)
638-638: Do not catch blind exception: Exception
(BLE001)
752-752: Avoid specifying long messages outside the exception class
(TRY003)
757-757: Avoid specifying long messages outside the exception class
(TRY003)
761-761: Avoid specifying long messages outside the exception class
(TRY003)
777-780: Avoid specifying long messages outside the exception class
(TRY003)
782-784: Avoid specifying long messages outside the exception class
(TRY003)
855-858: Avoid specifying long messages outside the exception class
(TRY003)
914-914: Consider moving this statement to an else block
(TRY300)
915-915: Do not catch blind exception: Exception
(BLE001)
916-916: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
968-968: Do not catch blind exception: Exception
(BLE001)
969-969: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
972-972: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1008-1008: Avoid specifying long messages outside the exception class
(TRY003)
1044-1044: Do not catch blind exception: Exception
(BLE001)
1046-1046: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1505-1506: try-except-pass detected, consider logging the exception
(S110)
1505-1505: Do not catch blind exception: Exception
(BLE001)
1517-1517: Do not catch blind exception: Exception
(BLE001)
1518-1518: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1544-1544: Do not catch blind exception: Exception
(BLE001)
1545-1545: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1552-1552: Do not catch blind exception: Exception
(BLE001)
1553-1553: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
flashinfer/__main__.py
455-455: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (4)
flashinfer/api_logging.py (2)
876-933: Excellent multi-process safety design.The use of composite keys
(process_id, self_id)for the object registry (line 882) is a well-thought-out design that prevents collisions in multi-GPU/multi-process environments where different processes may have objects at the same memory address.
114-136: Good security practice with the dump warning.The
_warn_dump()function provides clear warnings about the security implications of Level 10 logging, which is essential for preventing accidental leakage of sensitive data.docs/logging.rst (1)
1-678: Excellent documentation for Level 10 logging.The documentation is comprehensive, well-structured, and provides practical examples for both producer and reproducer scripts. The coverage of SafeTensors trade-offs, dump filtering patterns, and JSONL format details is particularly helpful.
tests/utils/test_logging_replay.py (1)
1-1067: Comprehensive test coverage for Level 10 logging.This test suite provides excellent coverage of the Level 10 dump/replay functionality:
- Multiple FlashInfer APIs (mm_fp8, bmm_fp8, mm_fp4, prefill/decode)
- Dump filtering (include, exclude, combined patterns)
- JSONL format verification
- SafeTensors support
- CLI replay path
The use of the
level10_environmentfixture ensures proper isolation between tests, and theverify_and_replay_dumphelper provides consistent verification logic.
| except Exception: | ||
| return { | ||
| "type": type(value).__name__, | ||
| "repr": "<not serializable>", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Improve exception handling in serialization.
The broad except Exception: catch silently returns a fallback value without logging the error. This makes debugging serialization issues difficult.
🔧 Suggested improvement
except Exception:
+ _logger.warning(f"Failed to serialize {type(value).__name__}: {value!r}", exc_info=True)
return {
"type": type(value).__name__,
"repr": "<not serializable>",
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception: | |
| return { | |
| "type": type(value).__name__, | |
| "repr": "<not serializable>", | |
| } | |
| except Exception: | |
| _logger.warning(f"Failed to serialize {type(value).__name__}: {value!r}", exc_info=True) | |
| return { | |
| "type": type(value).__name__, | |
| "repr": "<not serializable>", | |
| } |
🧰 Tools
🪛 Ruff (0.14.10)
262-262: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @flashinfer/api_logging.py around lines 262 - 266, The broad except block
that returns a fallback for un-serializable values swallows errors and offers no
diagnostics; change the `except Exception:` to `except Exception as e:` and log
the exception (preferably via the module logger or `logging.exception`)
including the value and traceback before returning the fallback dict (which can
keep `"type": type(value).__name__` and `"repr": "<not serializable>"`). Ensure
you reference the same local symbols (`value`) and preserve the existing return
shape while adding the logging call so serialization failures are recorded.
| _logger.error( | ||
| "safetensors package not installed. " | ||
| "Install with: pip install safetensors" | ||
| ) | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use _logger.exception for better error context.
When logging errors in exception handlers, use _logger.exception() instead of _logger.error() to automatically include the traceback.
🔧 Suggested fix
except ImportError:
- _logger.error(
+ _logger.exception(
"safetensors package not installed. "
"Install with: pip install safetensors"
)Apply the same pattern to other exception handlers at lines 483-486, 591-594, 916, 969-972, 1046, 1518, 1553.
🧰 Tools
🪛 Ruff (0.14.10)
403-406: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In @flashinfer/api_logging.py around lines 403 - 407, Replace calls to
_logger.error(...) inside exception handlers with _logger.exception(...) so the
traceback is captured when re-raising; for example change the block that logs
"safetensors package not installed. Install with: pip install safetensors" to
use _logger.exception with the same message before the raise, and apply the same
change for the other exception handlers referenced (the _logger.error
occurrences around the safetensors message and the ones at the other listed
places). Keep the existing messages and re-raise behavior unchanged—only swap
_logger.error to _logger.exception so traceback is included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @flashinfer/api_logging.py:
- Around line 66-69: The global counters _dump_count, _dump_total_size_bytes,
and _dump_call_counter are modified without synchronization and can race in
multi-threaded use; introduce a module-level threading.Lock (e.g.,
_dump_state_lock) and guard all reads/writes/updates to those globals
(increments, size adds, and per-function counter updates) with with
_dump_state_lock: to ensure atomicity, and remove the unused
_session_jsonl_initialized variable if it is not referenced elsewhere in the
module.
- Around line 530-541: The output dumping branch that uses safetensors (guarded
by _DUMP_SAFETENSORS in the output_tensors block) lacks ImportError handling;
wrap the "from safetensors.torch import save_file" import in a try/except
ImportError and log a clear error via the same logger used elsewhere (or raise a
RuntimeError with a descriptive message) and fall back to the torch.save path if
desired; ensure the catch references the same symbols (save_file,
tensors_contiguous, dump_path) and mirrors the import-error handling used in
_dump_function_inputs so missing safetensors produces a clear message rather
than an unhandled exception.
🧹 Nitpick comments (8)
flashinfer/api_logging.py (8)
392-408: Improve safetensors import error handling.When safetensors is not installed, the code logs an error and re-raises, but uses generic
_logger.errorinstead of_logger.exceptionwhich would include the full traceback automatically.As per static analysis:
if _DUMP_SAFETENSORS: # SafeTensors format: faster, no pickle, but loses stride/contiguity try: from safetensors.torch import save_file # safetensors requires contiguous tensors tensors_contiguous = { k: v.contiguous() for k, v in input_tensors.items() } save_file(tensors_contiguous, str(dump_dir / "inputs.safetensors")) except ImportError: - _logger.error( + _logger.exception( "safetensors package not installed. " "Install with: pip install safetensors" ) raise
482-487: Uselogging.exceptionfor better error diagnostics.The code manually calls
traceback.format_exc()and logs it. Python'slogging.exception()does this automatically and is the preferred pattern.except Exception as e: - _logger.error(f"Failed to dump function call {func_name}: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Failed to dump function call {func_name}: {e}") return NoneSame pattern applies to other exception handlers throughout the file (lines 591-594, 969-972, etc.).
590-594: Uselogging.exceptioninstead of manual traceback formatting.except Exception as e: - _logger.error(f"Failed to dump outputs to {dump_dir}: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Failed to dump outputs to {dump_dir}: {e}")
626-643: Consider caching resolved functions.The
_resolve_functionis called for each replay, which involves module import and attribute traversal. For replay sequences with many calls to the same function, consider caching the resolved function objects.from functools import lru_cache @lru_cache(maxsize=128) def _resolve_function(module_name: str, function_name: str) -> Optional[Callable]: """Resolve a function from module name and function name (cached).""" try: module = importlib.import_module(module_name) # Handle nested function names (e.g. Class.method) parts = function_name.split(".") obj: Any = module for part in parts: obj = getattr(obj, part) if not callable(obj): return None return obj except Exception as e: _logger.warning( f"Could not resolve function {module_name}.{function_name}: {e}" ) return NoneNote: This caching is optional since resolution is typically fast, but could help in large replay sequences.
Based on coding guidelines,
@functools.cacheis preferred.
968-973: Uselogging.exceptionfor cleaner error logging.except Exception as e: - _logger.error(f"Execution failed: {e}") - import traceback - - _logger.error(traceback.format_exc()) + _logger.exception(f"Execution failed: {e}") result_dict["execution_error"] = str(e)
750-784: Consider extracting tensor loading logic.The tensor loading code (lines 765-788) is duplicated later in the function (lines 838-862) with slight variations. Consider extracting this into a helper function to reduce duplication.
def _load_tensors(dump_path: Path, tensor_type: str, device: str) -> Dict[str, torch.Tensor]: """Load tensors from dump directory (supports both .pt and .safetensors).""" pt_path = dump_path / f"{tensor_type}.pt" safetensors_path = dump_path / f"{tensor_type}.safetensors" if pt_path.exists(): tensors = torch.load(str(pt_path), map_location="cpu") elif safetensors_path.exists(): try: from safetensors.torch import load_file tensors = load_file(str(safetensors_path), device="cpu") except ImportError: raise ImportError( f"Dump was saved with safetensors but package not installed. " "Install with: pip install safetensors" ) from None else: raise FileNotFoundError( f"Neither {tensor_type}.pt nor {tensor_type}.safetensors found in {dump_path}" ) # Move to target device return {key: tensor.to(device) for key, tensor in tensors.items()}Then use:
input_tensors = _load_tensors(dump_path, "inputs", device)
1505-1515: Consider logging when class detection fails.The try-except block silently passes when class name detection fails (line 1514-1515). While this is likely rare, logging at debug level could help diagnose issues.
if args and hasattr(args[0], "__class__"): try: class_name = args[0].__class__.__name__ if "Wrapper" in class_name or class_name in [ "BatchMLAPagedAttentionWrapper" ]: func_name = f"{class_name}.{func_name}" self_id = id(args[0]) except Exception: - pass + _logger.debug(f"Could not determine class name for {f.__name__}")
1556-1563: Uselogging.exceptionfor dump errors.if _API_LOG_LEVEL >= 10 and dump_dir: try: _dump_function_outputs(dump_dir, result) _logger.info(f"Outputs dumped to: {dump_dir}") except Exception as e: - _logger.error(f"[DUMP ERROR (outputs) in {func_name}]: {e}") + _logger.exception(f"[DUMP ERROR (outputs) in {func_name}]: {e}")Same for line 1527.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/api_logging.py
🧰 Additional context used
📓 Path-based instructions (1)
flashinfer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
flashinfer/**/*.py: Use@functools.cachedecorator on Python API functions to implement module-level caching and avoid recompilation
Use@flashinfer_apidecorator for debugging API calls, enable viaFLASHINFER_LOGLEVELenvironment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Files:
flashinfer/api_logging.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Applied to files:
flashinfer/api_logging.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `functools.cache` decorator on Python API functions to implement module-level caching and avoid recompilation
Applied to files:
flashinfer/api_logging.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Use `FLASHINFER_CUDA_ARCH_LIST` environment variable to specify target GPU architectures (e.g., '8.0 9.0a') and `FLASHINFER_NVCC_THREADS` to control parallel compilation threads
Applied to files:
flashinfer/api_logging.py
🪛 Ruff (0.14.10)
flashinfer/api_logging.py
262-262: Do not catch blind exception: Exception
(BLE001)
403-406: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
457-457: Do not catch blind exception: Exception
(BLE001)
482-482: Do not catch blind exception: Exception
(BLE001)
483-483: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
486-486: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
590-590: Do not catch blind exception: Exception
(BLE001)
591-591: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
594-594: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
637-637: Consider moving this statement to an else block
(TRY300)
638-638: Do not catch blind exception: Exception
(BLE001)
752-752: Avoid specifying long messages outside the exception class
(TRY003)
757-757: Avoid specifying long messages outside the exception class
(TRY003)
761-761: Avoid specifying long messages outside the exception class
(TRY003)
777-780: Avoid specifying long messages outside the exception class
(TRY003)
782-784: Avoid specifying long messages outside the exception class
(TRY003)
855-858: Avoid specifying long messages outside the exception class
(TRY003)
914-914: Consider moving this statement to an else block
(TRY300)
915-915: Do not catch blind exception: Exception
(BLE001)
916-916: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
968-968: Do not catch blind exception: Exception
(BLE001)
969-969: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
972-972: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1008-1008: Avoid specifying long messages outside the exception class
(TRY003)
1044-1044: Do not catch blind exception: Exception
(BLE001)
1046-1046: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1514-1515: try-except-pass detected, consider logging the exception
(S110)
1514-1514: Do not catch blind exception: Exception
(BLE001)
1526-1526: Do not catch blind exception: Exception
(BLE001)
1527-1527: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1553-1553: Do not catch blind exception: Exception
(BLE001)
1554-1554: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1561-1561: Do not catch blind exception: Exception
(BLE001)
1562-1562: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (10)
flashinfer/api_logging.py (10)
114-136: LGTM: Comprehensive security warning for Level 10 dumping.The warning appropriately alerts users about security implications of dumping tensors to disk and provides clear guidance on configuration options.
138-176: LGTM: Clear filtering logic with good documentation.The include/exclude pattern filtering is well-documented and follows a sensible precedence: include patterns are checked first (if set), then exclude patterns.
230-267: LGTM: Robust serialization with appropriate truncation.The serialization handles special types (torch.dtype, enum.Enum) and gracefully truncates long structures to prevent excessive metadata size. Good fallback to
<not serializable>for problematic values.
597-624: LGTM: Robust value reconstruction.Handles torch.dtype reconstruction correctly by parsing strings like "torch.bfloat16". Falls back gracefully when dtype cannot be resolved.
645-698: LGTM: Comprehensive result comparison logic.The recursive comparison handles tensors (with numerical tolerances), collections, and fallback types. Good use of
torch.allclosewith configurablertolandatol.
876-918: LGTM: Robust stateful replay with composite keys.The use of
(process_id, self_id)as composite keys prevents collisions in multi-GPU environments. The special handling for__init__to instantiate objects is well-implemented.
987-1050: LGTM: Well-designed sequence replay with error resilience.The function sorts dumps by timestamp and maintains a shared object registry across all replays. Error handling allows the sequence to continue even if individual replays fail, which is good for debugging.
1517-1528: Good crash-safe design with pre-execution dumps.Level 10 dumps inputs before function execution, ensuring that debugging information is preserved even if the function crashes with CUDA errors or other failures.
1536-1540: LGTM: Effective level capping for logging.Capping the effective logging level at 5 prevents Level 10's dump operations from interfering with the regular logging output format.
1125-1125: LGTM: Immediate security warning on module import.Calling
_warn_dump()at module load time ensures users are immediately aware when Level 10 dumping is enabled, before any sensitive data is processed.
| import enum | ||
| import fnmatch | ||
| import functools | ||
| import inspect | ||
| import json | ||
| import logging | ||
| import os | ||
| import sys | ||
| from typing import Any, Callable | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
| from typing import Any, Callable, Dict, Tuple, Optional | ||
| import contextlib | ||
| import importlib | ||
| import torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for threading usage in FlashInfer codebase
echo "=== Searching for threading patterns ==="
rg -n "threading|Thread\(|Pool\(|ThreadPool|concurrent\.futures" --type=py -g '!test*' -g '!*_test.py' | head -30
echo ""
echo "=== Searching for existing global state with locks ==="
rg -n "threading\.Lock|threading\.RLock|Lock()" --type=py | head -30
echo ""
echo "=== Checking for multiprocessing usage ==="
rg -n "multiprocessing|Process\(" --type=py -g '!test*' | head -20Repository: flashinfer-ai/flashinfer
Length of output: 1453
🏁 Script executed:
#!/bin/bash
# Examine the api_logging.py file to understand global state
echo "=== Examining flashinfer/api_logging.py structure ==="
wc -l flashinfer/api_logging.py
echo ""
echo "=== Checking global variable declarations ==="
rg -n "^[A-Za-z_][A-Za-z0-9_]*\s*=" flashinfer/api_logging.py | head -50
echo ""
echo "=== Checking for any Level 10 references in codebase ==="
rg -n "Level.*10|level.*10" --type=py | head -20Repository: flashinfer-ai/flashinfer
Length of output: 3200
🏁 Script executed:
#!/bin/bash
# Search for usage of api_logging module to understand threading context
echo "=== Searching for imports of api_logging ==="
rg -n "from flashinfer.*api_logging|import.*api_logging" --type=py | head -20
echo ""
echo "=== Checking if there are any test files with threading ==="
find . -name "*test*.py" -o -name "test_*.py" | xargs rg -l "threading|Thread|concurrent" 2>/dev/null | head -10Repository: flashinfer-ai/flashinfer
Length of output: 809
🏁 Script executed:
#!/bin/bash
# Examine how global state is modified in api_logging.py
echo "=== Searching for modifications to _dump_count and related globals ==="
rg -n "_dump_count|_dump_total_size_bytes|_dump_call_counter|_session_jsonl_initialized" flashinfer/api_logging.py
echo ""
echo "=== Examining the _dump_api wrapper/decorator structure ==="
grep -n "def _dump_api\|def flashinfer_api" flashinfer/api_logging.py | head -5
echo ""
echo "=== Checking fused_moe/utils.py to see threading.local() pattern ==="
head -100 flashinfer/fused_moe/utils.py | tail -70Repository: flashinfer-ai/flashinfer
Length of output: 3020
🏁 Script executed:
#!/bin/bash
# Check if any file writing happens without locks
echo "=== Searching for file writes in api_logging that might be concurrent ==="
rg -n "open\(|\.write\(|\.dump\(|jsonl" flashinfer/api_logging.py | head -40Repository: flashinfer-ai/flashinfer
Length of output: 1844
🏁 Script executed:
#!/bin/bash
# Check test file to understand if threading is tested with Level 10
echo "=== Examining test_logging_replay.py for threading scenarios ==="
grep -n "thread\|Thread\|concurrent" tests/utils/test_logging_replay.py
echo ""
echo "=== Check the level10_environment fixture ==="
sed -n '47,70p' tests/utils/test_logging_replay.pyRepository: flashinfer-ai/flashinfer
Length of output: 1361
🏁 Script executed:
#!/bin/bash
# Search for all usages of _session_jsonl_initialized
rg -n "_session_jsonl_initialized" flashinfer/api_logging.py -B 2 -A 2Repository: flashinfer-ai/flashinfer
Length of output: 294
🏁 Script executed:
#!/bin/bash
# Get more context around the _dump_api function where globals are modified
sed -n '341,475p' flashinfer/api_logging.py | head -80Repository: flashinfer-ai/flashinfer
Length of output: 3395
Add thread-safety synchronization for Level 10 dumping.
The implementation uses unprotected module-level global state that is modified during runtime. Confirmed issues:
-
Race conditions on global counters:
_dump_count(line 350, 471) and_dump_total_size_bytes(line 381, 472, 580) are read and written without synchronization. Concurrent calls from multiple threads can corrupt counters and bypass size/count limits. -
Dictionary race condition:
_dump_call_counter(lines 359-362) undergoes read-modify-write operations without synchronization, risking KeyError or incorrect call sequence numbers. -
Concurrent file writes:
_append_to_jsonl()(line 189) opens files in append mode without locks. Concurrent calls can corrupt session.jsonl and metadata.jsonl JSONL formatting.
FlashInfer does use threading (ThreadPoolExecutor in artifacts.py). Add threading.Lock() to protect critical sections, or use threading.local() (already used in fused_moe/utils.py) if per-thread state is appropriate. Additionally, _session_jsonl_initialized (line 69) appears unused—verify and remove if not needed.
| _dump_count = 0 | ||
| _dump_total_size_bytes = 0 | ||
| _dump_call_counter = {} # Track call count per function | ||
| _session_jsonl_initialized = False # Track if session.jsonl header was written |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review multi-process safety and thread safety of global state.
The global counters _dump_count, _dump_total_size_bytes, and _dump_call_counter are not protected by locks and may have race conditions in multi-threaded environments. While each process gets its own copy (by design), within a single process multiple threads could corrupt these counters.
Additionally, _session_jsonl_initialized is declared but never used in the code.
Consider these improvements
- If multi-threading is expected, protect global state with locks:
import threading
_dump_state_lock = threading.Lock()
_dump_count = 0
_dump_total_size_bytes = 0
_dump_call_counter = {}Then wrap modifications:
with _dump_state_lock:
_dump_count += 1
_dump_total_size_bytes += input_size- Remove unused
_session_jsonl_initializedvariable if not needed.
🤖 Prompt for AI Agents
In @flashinfer/api_logging.py around lines 66 - 69, The global counters
_dump_count, _dump_total_size_bytes, and _dump_call_counter are modified without
synchronization and can race in multi-threaded use; introduce a module-level
threading.Lock (e.g., _dump_state_lock) and guard all reads/writes/updates to
those globals (increments, size adds, and per-function counter updates) with
with _dump_state_lock: to ensure atomicity, and remove the unused
_session_jsonl_initialized variable if it is not referenced elsewhere in the
module.
| def _append_to_jsonl(filepath: Path, record: Dict[str, Any]) -> None: | ||
| """ | ||
| Append a JSON record as a single line to a JSONL file. | ||
| Parameters | ||
| ---------- | ||
| filepath : Path | ||
| Path to the JSONL file | ||
| record : Dict[str, Any] | ||
| Record to append (will be serialized as single-line JSON) | ||
| """ | ||
| with open(filepath, "a") as f: | ||
| f.write(json.dumps(record) + "\n") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the file structure and find the relevant code
cd /repo && find . -type f -name "api_logging.py" | head -5Repository: flashinfer-ai/flashinfer
Length of output: 123
🏁 Script executed:
# Check for threading/concurrent constructs in the codebase
rg -n "(threading|Thread|Pool|concurrent|lock|Lock)" --type=py | head -30Repository: flashinfer-ai/flashinfer
Length of output: 2681
🏁 Script executed:
# Look at the full context of api_logging.py around the function
wc -l flashinfer/api_logging.pyRepository: flashinfer-ai/flashinfer
Length of output: 98
🏁 Script executed:
# Read the api_logging.py file to understand context and usage
cat -n flashinfer/api_logging.py | head -100Repository: flashinfer-ai/flashinfer
Length of output: 4350
🏁 Script executed:
# Find all calls to _append_to_jsonl
rg -n "_append_to_jsonl" --type=py -B2 -A2Repository: flashinfer-ai/flashinfer
Length of output: 1628
🏁 Script executed:
# Check for any @flashinfer_api or logging setup that might indicate concurrent usage
rg -n "@flashinfer_api|FLASHINFER_LOGLEVEL" --type=py -B2 -A2 | head -50Repository: flashinfer-ai/flashinfer
Length of output: 3870
🏁 Script executed:
# See more context around session.jsonl usage and _DUMP_DIR initialization
cat -n flashinfer/api_logging.py | sed -n '430,490p'Repository: flashinfer-ai/flashinfer
Length of output: 2674
🏁 Script executed:
# Check if session.jsonl has process ID substitution or if it's shared
cat -n flashinfer/api_logging.py | sed -n '46,52p'Repository: flashinfer-ai/flashinfer
Length of output: 516
🏁 Script executed:
# Search for any locking mechanisms (fcntl, threading.Lock, etc.)
rg -n "(fcntl|flock|Lock|RLock|Semaphore|threading\.)" flashinfer/api_logging.pyRepository: flashinfer-ai/flashinfer
Length of output: 50
🏁 Script executed:
# Check the actual dump directory initialization and whether it uses %i
cat -n flashinfer/api_logging.py | sed -n '200,250p'Repository: flashinfer-ai/flashinfer
Length of output: 1898
🏁 Script executed:
# Look at the broader context of where _append_to_jsonl is called from
cat -n flashinfer/api_logging.py | sed -n '440,480p'Repository: flashinfer-ai/flashinfer
Length of output: 1903
🏁 Script executed:
# See if there's a test or documentation about thread safety
rg -l "thread|Thread|concurrent" tests/ --type=py | xargs grep -l "logging\|api_logging" 2>/dev/null | head -5Repository: flashinfer-ai/flashinfer
Length of output: 50
🏁 Script executed:
# Look for documentation or decorators that describe thread safety
cat -n flashinfer/api_logging.py | sed -n '550,650p'Repository: flashinfer-ai/flashinfer
Length of output: 4488
🏁 Script executed:
# Search for the @flashinfer_api decorator definition
rg -n "@flashinfer_api|def flashinfer_api" flashinfer/api_logging.py -A15Repository: flashinfer-ai/flashinfer
Length of output: 1660
🏁 Script executed:
# Look at the full context of _dump_function_inputs to understand the context better
cat -n flashinfer/api_logging.py | sed -n '1,50p'Repository: flashinfer-ai/flashinfer
Length of output: 1887
🏁 Script executed:
# Check if there are any comments about thread safety or limitations
rg -n "thread|Thread|safe|concurrent|race|lock" flashinfer/api_logging.pyRepository: flashinfer-ai/flashinfer
Length of output: 3025
🏁 Script executed:
# Look at how dump_dir is created - whether it uses process ID
cat -n flashinfer/api_logging.py | sed -n '330,380p'Repository: flashinfer-ai/flashinfer
Length of output: 2236
Add file locking to _append_to_jsonl for thread/process safety.
The function appends to session.jsonl (line 468, 577) without synchronization. When multiple processes or threads write to this shared file, records may be interleaved or corrupted. Per-process metadata.jsonl files are safe due to unique paths, but the central session.jsonl requires protection.
Implement file locking using fcntl.flock (POSIX) or msvcrt.locking (Windows), or document that Level 10 logging is not suitable for multi-threaded/multi-process scenarios.
| global _dump_count, _dump_total_size_bytes | ||
|
|
||
| # Check include/exclude filters first (before any work is done) | ||
| if not _should_dump_function(func_name): | ||
| _logger.debug( | ||
| f"Skipping dump for {func_name} (filtered by include/exclude patterns)" | ||
| ) | ||
| return None | ||
|
|
||
| if _dump_count >= _DUMP_MAX_COUNT: | ||
| _logger.warning( | ||
| f"Dump limit reached ({_DUMP_MAX_COUNT} dumps). Skipping dump for {func_name}. " | ||
| f"Increase FLASHINFER_DUMP_MAX_COUNT if needed." | ||
| ) | ||
| return None | ||
|
|
||
| try: | ||
| # Get call counter for this function | ||
| if func_name not in _dump_call_counter: | ||
| _dump_call_counter[func_name] = 0 | ||
| _dump_call_counter[func_name] += 1 | ||
| call_seq = _dump_call_counter[func_name] | ||
|
|
||
| # Create dump directory structure | ||
| timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")[ | ||
| :-3 | ||
| ] # Include milliseconds | ||
| pid = os.getpid() | ||
| dump_name = f"{timestamp}_pid{pid}_{func_name}_call{call_seq:04d}" | ||
| dump_dir = Path(_DUMP_DIR) / dump_name | ||
| dump_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| # Extract tensors and metadata from inputs | ||
| input_tensors, input_metadata = _extract_tensors_and_metadata(args, kwargs) | ||
|
|
||
| # Calculate input size | ||
| input_size = sum(_get_tensor_size_bytes(t) for t in input_tensors.values()) | ||
|
|
||
| # Check size limit (conservative check - only inputs for now) | ||
| max_size_bytes = _DUMP_MAX_SIZE_GB * 1024 * 1024 * 1024 | ||
| if _dump_total_size_bytes + input_size > max_size_bytes: | ||
| _logger.warning( | ||
| f"Dump size limit reached ({_DUMP_MAX_SIZE_GB} GB). Skipping dump for {func_name}. " | ||
| f"Increase FLASHINFER_DUMP_MAX_SIZE_GB if needed." | ||
| ) | ||
| # Clean up empty directory | ||
| dump_dir.rmdir() | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition in dump limit checks.
The size and count limit checks (lines 350-356, 381-388) read and compare global state without synchronization. In multi-threaded scenarios, multiple threads could pass these checks simultaneously and exceed the limits.
Example scenario:
- Thread A checks
_dump_count < _DUMP_MAX_COUNT(999 < 1000) → passes - Thread B checks
_dump_count < _DUMP_MAX_COUNT(999 < 1000) → passes - Both threads proceed to dump, resulting in 1001 total dumps
Consider:
- Using atomic operations or locks around limit checks and updates
- Accepting slight over-limit behavior and documenting it
- Documenting that Level 10 is not thread-safe
Related to earlier comment about global state thread safety.
| if output_tensors: | ||
| if _DUMP_SAFETENSORS: | ||
| # SafeTensors format: faster, no pickle, but loses stride/contiguity | ||
| from safetensors.torch import save_file | ||
|
|
||
| tensors_contiguous = { | ||
| k: v.contiguous() for k, v in output_tensors.items() | ||
| } | ||
| save_file(tensors_contiguous, str(dump_path / "outputs.safetensors")) | ||
| else: | ||
| # torch.save format: preserves stride/contiguity | ||
| torch.save(output_tensors, dump_path / "outputs.pt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing ImportError handling for safetensors in output dump.
Unlike _dump_function_inputs (lines 402-407), this code doesn't catch ImportError when importing safetensors. If the package is missing, an unhandled exception will be raised instead of a clear error message.
Add consistent error handling
if output_tensors:
if _DUMP_SAFETENSORS:
# SafeTensors format: faster, no pickle, but loses stride/contiguity
- from safetensors.torch import save_file
-
- tensors_contiguous = {
- k: v.contiguous() for k, v in output_tensors.items()
- }
- save_file(tensors_contiguous, str(dump_path / "outputs.safetensors"))
+ try:
+ from safetensors.torch import save_file
+
+ tensors_contiguous = {
+ k: v.contiguous() for k, v in output_tensors.items()
+ }
+ save_file(tensors_contiguous, str(dump_path / "outputs.safetensors"))
+ except ImportError:
+ _logger.exception(
+ "safetensors package not installed. "
+ "Install with: pip install safetensors"
+ )
+ raise
else:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if output_tensors: | |
| if _DUMP_SAFETENSORS: | |
| # SafeTensors format: faster, no pickle, but loses stride/contiguity | |
| from safetensors.torch import save_file | |
| tensors_contiguous = { | |
| k: v.contiguous() for k, v in output_tensors.items() | |
| } | |
| save_file(tensors_contiguous, str(dump_path / "outputs.safetensors")) | |
| else: | |
| # torch.save format: preserves stride/contiguity | |
| torch.save(output_tensors, dump_path / "outputs.pt") | |
| if output_tensors: | |
| if _DUMP_SAFETENSORS: | |
| # SafeTensors format: faster, no pickle, but loses stride/contiguity | |
| try: | |
| from safetensors.torch import save_file | |
| tensors_contiguous = { | |
| k: v.contiguous() for k, v in output_tensors.items() | |
| } | |
| save_file(tensors_contiguous, str(dump_path / "outputs.safetensors")) | |
| except ImportError: | |
| _logger.exception( | |
| "safetensors package not installed. " | |
| "Install with: pip install safetensors" | |
| ) | |
| raise | |
| else: | |
| # torch.save format: preserves stride/contiguity | |
| torch.save(output_tensors, dump_path / "outputs.pt") |
🤖 Prompt for AI Agents
In @flashinfer/api_logging.py around lines 530 - 541, The output dumping branch
that uses safetensors (guarded by _DUMP_SAFETENSORS in the output_tensors block)
lacks ImportError handling; wrap the "from safetensors.torch import save_file"
import in a try/except ImportError and log a clear error via the same logger
used elsewhere (or raise a RuntimeError with a descriptive message) and fall
back to the torch.save path if desired; ensure the catch references the same
symbols (save_file, tensors_contiguous, dump_path) and mirrors the import-error
handling used in _dump_function_inputs so missing safetensors produces a clear
message rather than an unhandled exception.
|
/bot run |
…nts areinterleaved
38db19f to
1505266
Compare
|
/bot stop |
|
The GitLab CI pipeline #41372000 has been cancelled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
flashinfer/api_logging.py (2)
307-488: Input dumping implementation is solid.The function correctly implements crash-safe dumping with proper checks:
- Early filtering and limit checks before I/O
- Size validation before saving
- Comprehensive metadata collection
- Proper error handling
One optional improvement: Consider using
_logger.exception()instead of_logger.error()at lines 403 and 483 to automatically include stack traces.📝 Optional: Use logging.exception for better diagnostics
except ImportError: - _logger.error( + _logger.exception( "safetensors package not installed. " "Install with: pip install safetensors" )And similarly at line 483:
except Exception as e: - _logger.error(f"Failed to dump function call {func_name}: {e}") + _logger.exception(f"Failed to dump function call {func_name}: {e}") import traceback - - _logger.error(traceback.format_exc())Note: When using
logging.exception(), the explicit traceback logging becomes redundant.
490-595: Output dumping correctly complements input dumping.The function properly appends outputs to existing dumps and updates both per-dump and session metadata. The crash-safe design (inputs first, outputs after) is correct.
Same optional suggestion as above: consider using
_logger.exception()for better diagnostics (lines 591, 594).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
benchmarks/routines/gemm.pydocs/logging.rstflashinfer/__main__.pyflashinfer/api_logging.pytests/utils/test_logging_replay.py
🚧 Files skipped from review as they are similar to previous changes (1)
- benchmarks/routines/gemm.py
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test implementations should useflashinfer.utilsfunctions (get_compute_capability,is_sm90a_supported,is_sm100a_supported, etc.) to skip tests on unsupported GPU architectures
For testing withmpirunon multi-GPU systems, use the pattern:mpirun -np <num_gpus> pytest tests/path/to/test.py::test_function
Avoid OOM (out-of-memory) errors in tests by using appropriate problem sizes -tests/conftest.pyprovides auto-skipping for OOM tests as a safety net but should not be relied upon
Files:
tests/utils/test_logging_replay.py
flashinfer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
flashinfer/**/*.py: Use@functools.cachedecorator on Python API functions to implement module-level caching and avoid recompilation
Use@flashinfer_apidecorator for debugging API calls, enable viaFLASHINFER_LOGLEVELenvironment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Files:
flashinfer/__main__.pyflashinfer/api_logging.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to tests/**/*.py : Test implementations should use `flashinfer.utils` functions (`get_compute_capability`, `is_sm90a_supported`, `is_sm100a_supported`, etc.) to skip tests on unsupported GPU architectures
Applied to files:
tests/utils/test_logging_replay.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API
Applied to files:
flashinfer/__main__.pydocs/logging.rst
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Applied to files:
flashinfer/__main__.pydocs/logging.rstflashinfer/api_logging.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Use `FLASHINFER_CUDA_ARCH_LIST` environment variable to specify target GPU architectures (e.g., '8.0 9.0a') and `FLASHINFER_NVCC_THREADS` to control parallel compilation threads
Applied to files:
docs/logging.rstflashinfer/api_logging.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `functools.cache` decorator on Python API functions to implement module-level caching and avoid recompilation
Applied to files:
flashinfer/api_logging.py
🧬 Code graph analysis (2)
tests/utils/test_logging_replay.py (4)
flashinfer/utils.py (1)
get_compute_capability(258-261)benchmarks/routines/gemm.py (1)
to_float8(164-170)flashinfer/api_logging.py (2)
replay_sequence(987-1049)replay_from_dump(700-984)flashinfer/__main__.py (1)
cli(68-73)
flashinfer/__main__.py (1)
flashinfer/api_logging.py (2)
replay_sequence(987-1049)replay_from_dump(700-984)
🪛 Ruff (0.14.10)
flashinfer/__main__.py
455-455: Do not catch blind exception: Exception
(BLE001)
flashinfer/api_logging.py
262-262: Do not catch blind exception: Exception
(BLE001)
403-406: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
457-457: Do not catch blind exception: Exception
(BLE001)
482-482: Do not catch blind exception: Exception
(BLE001)
483-483: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
486-486: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
590-590: Do not catch blind exception: Exception
(BLE001)
591-591: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
594-594: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
637-637: Consider moving this statement to an else block
(TRY300)
638-638: Do not catch blind exception: Exception
(BLE001)
752-752: Avoid specifying long messages outside the exception class
(TRY003)
757-757: Avoid specifying long messages outside the exception class
(TRY003)
761-761: Avoid specifying long messages outside the exception class
(TRY003)
777-780: Avoid specifying long messages outside the exception class
(TRY003)
782-784: Avoid specifying long messages outside the exception class
(TRY003)
855-858: Avoid specifying long messages outside the exception class
(TRY003)
914-914: Consider moving this statement to an else block
(TRY300)
915-915: Do not catch blind exception: Exception
(BLE001)
916-916: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
968-968: Do not catch blind exception: Exception
(BLE001)
969-969: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
972-972: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1008-1008: Avoid specifying long messages outside the exception class
(TRY003)
1044-1044: Do not catch blind exception: Exception
(BLE001)
1046-1046: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1514-1515: try-except-pass detected, consider logging the exception
(S110)
1514-1514: Do not catch blind exception: Exception
(BLE001)
1526-1526: Do not catch blind exception: Exception
(BLE001)
1527-1527: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1553-1553: Do not catch blind exception: Exception
(BLE001)
1554-1554: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
1561-1561: Do not catch blind exception: Exception
(BLE001)
1562-1562: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (14)
docs/logging.rst (1)
15-15: Documentation change looks good.The range update from 0-5 to 0-10 correctly reflects the new Level 10 Flight Recorder feature.
flashinfer/api_logging.py (7)
50-69: Configuration setup looks good.The environment-driven configuration with sensible defaults (20GB, 1000 dumps) provides good balance between usability and safety. Global tracking variables are appropriately initialized at module level.
114-136: Security warning is a good practice.The warning clearly communicates the security implications of Level 10 logging and displays the current configuration, which helps users understand what will be dumped.
138-176: Filter logic is clean and well-documented.The include-then-exclude filtering logic is straightforward and correctly implements the documented behavior. Early filtering (before any I/O) is efficient.
178-223: JSONL helpers are simple and correct.The crash-safe JSONL format (one record per line) is a good choice for logging that survives crashes. Implementation is straightforward.
230-267: Value serialization is robust.The function handles various Python types appropriately. The blind exception handler at line 262 is acceptable here since this is a serialization helper that should be maximally robust and return something rather than crash.
700-984: Replay implementation is well-designed.Key strengths:
- Composite registry key
(process_id, self_id)correctly prevents object collisions in multi-process environments- Auto-detection of tensor format (torch.save vs safetensors) makes replay format-agnostic
- Proper handling of stateful wrapper classes (__init__, plan, run methods)
- Comprehensive argument reconstruction
The stateful replay logic (lines 876-934) correctly handles constructor calls and method calls with proper object registry management.
1420-1570: Decorator integration is well-implemented.The decorator correctly integrates Level 10 dumping while maintaining backward compatibility:
- Inputs dumped BEFORE execution for crash safety
- Outputs dumped AFTER successful execution
- Logging capped at Level 5 for Level 10 to avoid redundant output
- Proper self_id capture for stateful objects
- Error handling ensures dump failures don't crash the API call
The warning at lines 1424-1426 appropriately flags this as experimental.
flashinfer/__main__.py (1)
386-457: CLI replay command is well-implemented.The implementation provides good user experience:
- Clear error messages with color coding
- Auto-detection of single dump vs session
- Human-readable output with function names
- Summary statistics for sessions
The broad exception handler at line 455 is appropriate for a CLI command to provide friendly error messages rather than stack traces.
tests/utils/test_logging_replay.py (5)
39-88: Test fixture is well-designed.The
level10_environmentfixture properly:
- Saves and restores environment variables
- Forces module reimport to pick up configuration changes
- Provides isolated test environment with tmp_path
- Cleans up after each test
This ensures tests don't interfere with each other.
157-194: GPU capability check is correct.Line 160-161 properly checks compute capability before running mm_fp8 tests. This follows the coding guideline to use
get_compute_capabilityfor architecture-specific tests.Based on coding guidelines.
366-719: Filtering tests provide thorough coverage.The test suite comprehensively validates dump filtering:
- Include-only, exclude-only, and combined scenarios
- Edge cases like no matches and multiple patterns
- Proper environment isolation for each test
The consistent test structure makes them easy to maintain.
721-910: JSONL tests validate crash-safe logging.The tests properly verify:
- JSONL format (not JSON) is used
- Two-line structure per call (inputs_saved, then completed)
- Session accumulation across multiple API calls
- Proper metadata fields in each record
This ensures the crash-safe logging design works correctly.
912-1064: SafeTensors tests are well-structured.Both tests appropriately:
- Use
pytest.importorskipto handle missing dependency- Verify correct file format based on configuration
- Test replay auto-detection
- Check metadata consistency
The mock dump approach in
test_safetensors_replay_auto_detectionefficiently tests format detection without executing a full API call.
|
/bot run |
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| pytest.main([__file__, "-v", "-s"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add another test with positional arguments only, for flashinfer-bench workflow.
|
[SUCCESS] Pipeline #41383969: 18/20 passed |
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with the PR overall, this feature should be really helpful in terms of reproducing bugs and collecting traces!
📌 Description
Flight Recorder: Level 10 API Logging with Tensor Dumps
This PR provides a "Flight Recorder" mechanism within FlashInfer that captures detailed API execution traces (inputs, outputs, metadata) to enable exact reproduction of bugs, performance regression testing, and offline benchmarking.
The feature is implemented as Level 10 of the existing API logging module. The expected workflow is:
FLASHINFER_LOGLEVEL=10. Then makes a FlashInfer API call.inputs.pt(orinput.safetensors) and metadata tometadata.json.outputs.pt(orinput.safetensors) and completion status is appended tometadata.jsonl.FLASHINFER_DUMP_DIRsets the dump location (default:flashinfer_dumps).{timestamp}_pid{process_id}_{func_name}_call{sequence}.torch.save(default) or optionallysafetensors.session.jsonlprovides a quick-scan overview of all recorded API calls.Non-contiguous tensors are needed to reproduce certain API calls like
mm_fp4().flashinfer replay --dir <dump_path>.Environment Variables
Dump Filtering
For large-scale inference with many API calls, users can filter which APIs are dumped:
Filter logic:
DUMP_INCLUDEis set, only APIs matching at least one pattern are dumpedDUMP_EXCLUDEis set, APIs matching any pattern are skippedImplementation Details
torch.saveby default instead of raw bytes or simplified custom formats. FlashInfer kernels often rely on specific memory layouts (strides, non-contiguous views).torch.savepreserves these properties faithfully, whereas reconstructing them from shape/stride metadata is error-prone and complex. An optionalsafetensorsformat is available for pickle-free serialization, though it does not preserve tensor strides.🔍 Related Issues
#2170
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
Configuration
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.