Skip to content

fix: case-insensitive file extension detection in RAG data-type auto-detection#6400

Open
immuhammadfurqan wants to merge 1 commit into
crewAIInc:mainfrom
immuhammadfurqan:fix/case-insensitive-file-type-detection
Open

fix: case-insensitive file extension detection in RAG data-type auto-detection#6400
immuhammadfurqan wants to merge 1 commit into
crewAIInc:mainfrom
immuhammadfurqan:fix/case-insensitive-file-type-detection

Conversation

@immuhammadfurqan

Copy link
Copy Markdown

Summary

DataTypes.from_content() matched file extensions case-sensitively, so files and URLs with uppercase extensions (.PDF, .CSV, .DOCX, …) were misrouted to the plain-text loader — feeding raw binary into the RAG index instead of the parsed document text. This lowercases the path before the extension comparison in get_file_type().

Fixes #6399

Changes

  • rag/data_types.py: lowercase the path before extension matching in get_file_type() (covers both local files and URLs).
  • tests/rag/test_data_types.py: new regression tests for mixed-case file and URL extensions.

Testing

  • New tests fail on main (12 failures across .PDF/.CSV/.DOCX/.MDX/.MD and URL cases) and pass with the fix (22 passed).
  • Existing tests/tools/rag/test_rag_tool_add_data_type.py still passes (39 passed).

…detection

DataTypes.from_content() matched file extensions case-sensitively, so files
and URLs with uppercase extensions (.PDF, .CSV, .DOCX, ...) were misrouted to
the plain-text loader, feeding raw binary into the RAG index. Lowercase the
path before comparison and add regression tests for mixed-case file/URL
extensions.

Fixes crewAIInc#6399
@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 95d68f5a-906f-41fc-9350-982fcdc714e0

📥 Commits

Reviewing files that changed from the base of the PR and between 694881c and 081e267.

📒 Files selected for processing (2)
  • lib/crewai-tools/src/crewai_tools/rag/data_types.py
  • lib/crewai-tools/tests/rag/test_data_types.py

📝 Walkthrough

Walkthrough

The get_file_type logic in DataTypes.from_content now lowercases the input path before performing extension matching, fixing case-sensitive misclassification of uppercase/mixed-case file extensions. A new test module validates this behavior for both local file paths and URLs.

Changes

Case-Insensitive Extension Detection

Layer / File(s) Summary
Lowercase path matching and validation
lib/crewai-tools/src/crewai_tools/rag/data_types.py, lib/crewai-tools/tests/rag/test_data_types.py
get_file_type lowercases the path before comparing against extension suffixes, replacing case-sensitive matching; new parametrized tests cover local file paths and URLs with uppercase/mixed-case extensions (PDF/CSV/JSON/XML/DOCX/MDX/TXT), including a regression test for .PDF.

No sequence diagram is warranted: the change is a single-function bug fix with a single component path (extension string matching) and no multi-actor interaction.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: case-insensitive file extension detection for RAG auto-detection.
Description check ✅ Passed The description matches the code changes and explains the bug fix, tests, and expected behavior.
Linked Issues check ✅ Passed The fix and regression tests address the linked issue's case-insensitive extension detection requirements.
Out of Scope Changes check ✅ Passed The PR only changes the detection logic and adds focused tests, with no obvious unrelated changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] RAG file-type auto-detection is case-sensitive — uppercase extensions (.PDF, .CSV, .DOCX) misrouted to the text loader

1 participant