Skip to content

improve: reponse handling for file chunks in tool's invocation #20523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

bowenliang123
Copy link
Contributor

@bowenliang123 bowenliang123 commented May 30, 2025

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Summary

  • to close improve reponse handling of file chunks in tool's invocation #20522

  • adding --strict-bytes flag in running mypy type checks, which is introduced since 1.15 and will be enabled by default in 2.x. And it raises type errors in tool.py, core/plugin/impl/tool.py:146: error: Argument "blob" to "BlobMessage" has incompatible type "bytearray"; expected "bytes" [arg-type]

  1. Precomputed Constants:

    • Added CHUNK_SIZE_LIMIT and FILE_SIZE_LIMIT constants for better maintainability
    • Replaced magic numbers with these constants
  2. File Size Validation:

    • Added early check for file size limit before processing chunks
    • Ensures we fail fast without unnecessary processing
  3. Memory Optimizations:

    • Used slots in FileChunk to reduce memory overhead
    • Converted to bytes before yielding to release memory earlier
    • Used slicing with computed start position for efficient writes
    • Explicitly delete finished files using del
  4. Error Handling Improvements:

    • Added explicit size mismatch validation for final chunks
    • Standardized error messages with exact sizes
    • Added boundary validation before writes
  5. Efficiency Improvements:

    • Created file_chunk reference to avoid repeated dict lookups
    • Combined related operations into contiguous blocks
    • Reduced bytearray slicing calculations
  6. Type Safety:

    • Converted bytearray to immutable bytes when yielding
    • Added explicit error cases for size mismatches

These changes improve:

  • Memory efficiency (especially with large numbers of concurrent files)
  • Processing speed (repeated dictionary lookups and slicing calculations)
  • Readability and maintainability
  • Error prevention and detection
  • Failure response times

Screenshots

Before After
... ...

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. 🔨 feat:tools Tools for agent, function call related stuff. labels May 30, 2025
@bowenliang123 bowenliang123 changed the title improve: improve reponse handling for file chuncks in tool's invocation improve: reponse handling for file chuncks in tool's invocation May 30, 2025
@crazywoola crazywoola requested a review from Copilot May 31, 2025 01:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces stricter type checking and enhances the file‐chunk reassembly logic by enforcing size limits, optimizing memory use, and adding boundary validations.

  • Enable --strict-bytes in the dev mypy script
  • Define CHUNK_SIZE_LIMIT and FILE_SIZE_LIMIT with pre-checks
  • Add boundary checks, convert to immutable bytes, and use __slots__ for FileChunk

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dev/mypy-check Added --strict-bytes flag to the mypy command
api/core/plugin/impl/tool.py Introduced size‐limit constants, slot‐based FileChunk, validations, and memory optimizations
Comments suppressed due to low confidence (2)

api/core/plugin/impl/tool.py:178

  • The del files[chunk_id] is unconditionally executed on every blob chunk, which deletes the buffer even for non-final chunks. This should be indented inside the if is_end: block so that buffers are only removed after the final chunk.
del files[chunk_id]

api/core/plugin/impl/tool.py:146

  • [nitpick] There are new error paths for chunk- and file-size violations; consider adding unit tests to verify both limits and their error messages.
if len(blob_data) > CHUNK_SIZE_LIMIT:

@bowenliang123 bowenliang123 changed the title improve: reponse handling for file chuncks in tool's invocation improve: reponse handling for file chunks in tool's invocation Jun 1, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. size:M This PR changes 30-99 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 1, 2025
@bowenliang123 bowenliang123 force-pushed the Improve-tool-invoke branch from 940365e to 6209108 Compare June 3, 2025 14:59
@bowenliang123 bowenliang123 deleted the Improve-tool-invoke branch June 5, 2025 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 feat:tools Tools for agent, function call related stuff. size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve reponse handling of file chunks in tool's invocation
1 participant